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CHAPTER  I:  THEORY  OF  OPTIMAL  PROPERTIES  OF  MAXIMUM  LIKELIHOOD 


ESTIMATORS  UNDER  GENERAL  CONDITIONS 

1.1  Introduction 

The  purpose  of  this  work  is  to  establish  optimality  properties 
for  the  maximum  likelihood  estimators,  MLE,  of  a wide  variety  of 
econometric  models,  under  assumptions  which  are  as  weak  as  possible. 
Theorems  abound  in  the  literature  on  optimality  properties  for  MLE. 

The  strength  of  the  assumptions,  as  well  as  the  conclusions,  vary. 
References  to  this  literature  follow  in  Section  1.2.  We  shall  be 
interested  in  proving  weak  consistency,  asymptotic  normality  and 
efficiency  in  the  maximum  probability,  KP,  sense.  Section  1.3  con- 
tains references  to  the  literature  for  these  properties.  Section  1.4 
contains  theorems  which  allow  one  to  prove  that  the  assumptions  required 
for  our  optimality  properties  are  fulfilled.  The  theorem  of  section 
1.5  allows  us  to  prove  the  optimality  properties  for  one  set  of  para- 
meters, then  conclude  that  they  hold  on  an  image  set,  under  certain 
conditions  on  the  map.  Several  lemmas  which  will  be  useful  throughout 
the  following  chapters  are  contained  in  section  1.6. 

1 . 2 References  to  the  Literature 

As  pointed  out  by  Crowder  [1976],  proofs  of  optimality  properties 
for  MLE  with  dependent  observations  generally  follow  one  of  two  lines: 
that  of  Wald  [1949]  or  that  of  Cramer  [1946].  Wald's  proof  of  strong 
consistency  of  the  MLE  involves  the  assumption  that  the  observations  are 


1 


but  no  differentia- 


independent  and  identically  distributed,  i.i.d.,  but  no  differentia- 
bility assumptions  are  required.  These  assumptions  characterize  the 
continuity  and  integrability  of  various  functions  of  the  observations’ 
joint  density.  Neither  asymptotic  normality  nor  efficiency  results  of 
any  kind  were  obtained.  However,  Wald  pointed  out  that  the  method  of 
proof  could  be  extended  to  dependent  variables,  under  certain  conditions. 
Silvey  C1961]  adopts  a similar  argument  in  his  proof  of  weak  consistency. 
The  underlying  idea  of  his  proof  is  that 


1.1  R(n,0,6°)  = log  L (6)  - log  L (6°) 

n — n — 

is  a martingale,  where  L„(0_)  is  the  likelihood  function  of  the 

k 

observations  when  003c  ” is  the  unknown  parameter  value.  Since  the 

ML  method  chooses  ©_  over  0*  if  and  only  if  R(n,0_,£*)  > 0,  condi- 
tions are  required  which  imply  the  probability  that  R(n,<9  ,60  > 0,  for 
all  0 outside  any  given  neighborhood  of  0°,  goes  to  zero  as  n -*■  ». 
These  conditions  are: 

51.  0 is  compact. 

52.  Var(R(n,e°,£))1/^/E(R(n,0°,(O)  goes  to  zero  as  n ■+ 
uniformly  outside  any  open  neighborhood  of  0°,  where  the 
variance  and  expectation  are  computed  under  0^. 

53.  A regularity  condition  which  assures  that  log  L (0)  is 

n *' 

"sufficiently  continuous",  i.e.  that  if  0*  and  0**  are 
outside  a given  neighborhood  of  0°,  and  | ©*— 0** | is 
sufficiently  small,  then  R(n,£°,0*)  > 0 implies 
R(n,0°,0**)  > 0. 
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Additional  assumptions  are  required  for  asymptotic  normality.  Assuming 
now  that  C'-p?  , these  conditions  are  given  as: 

54.  Var( -3"log  ( 9 ) / 32 0 | Q)1/2  / E(-32log  Ln(0)/82e  | Q)  goes  to 

n eu  11  0 

0 as  n -*■  «,  where  the  variance  and  expectation  are  computed 
under  0° . 

55.  31og  L ( 0 ) / 3 0 is  asymptotically  normally  distributed. 

n 


Silvey  cites  a martingale  central  limit  theorem  to  guarantee  assumption 
2 2 

five.  Since  -3  log  L (0)/3  0 is  a submartingale  under  appropriate 

n 

uniform  convergence  conditions,  Silvey  concludes  that  assumption  four 
is  often  fulfilled,  since  many  submartingales  behave  in  this  way. 

As  mentioned  above,  Cramer's  proof  of  consistency  [1946]  for 
i.i.d.  random  variables  may  be  modified  so  that  it  applies  to  dependent 
variables.  This  is  the  approach  taken  by  Wald  [1946]  for  a scalar 
unknown  0,  where  0 is  an  interior  point  of  a nondegenerate  interval, 
A.  He  also  demonstrates  asymptotic  efficiency  in  the  sense  described 
below.  Let 


c (0*)  = E(32  log  L (0)/32O)L 
n n '9=0" 


where  the  expectation  is  taken  under  0*.  Wald  defines  a sequence  of 


estimators,  {t  },  to  be  asymptotically  efficient  in  the  wide  sense  if 
a random  sequence  {u  } exists,  such  that 


lim  E(u  ) = 0 and  lim  E(u  ) = 1 
n n 

n-*”  n-*» 
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where  the  expectation  in  both  terms  is  computed  under  0J,  and 


(c  (e°))1/2(t  -0°)  - u 


converges  stochastically  to  zero  as  n -*  ®.  The  term  "wide  sense"  is 
used  because  the  asymptotic  distribution  of  {t  } is  not  necessarily 
normal. 

Under  conditions  WC1  through  WC4  below,  the  likelihood  equation 
has  at  least  one  root  which  is  consistent.  Wald  shows  that  any  root 
which  is  consistent  is  also  asymptotically  efficient  in  the  wide  sense. 


WC1 . The  first,  second  and  third  derivatives  of  l (0)  exist  for 

n 

all  0 in  A.,  with  probability  one.  Furthermore, 


1llK  3L  (0) 
lub  n 


i=l, 2, 3 


is  integrable  over  the  observation  space. 
V7C2.  For  any  0 in  A, 


lim  c (0)  = 00 

n-H»  n 


2 2 1 /9  2 2 

WC3.  Var(3  log  1^(0 )/3  0)  /E(3  log  Lri(O)/3“'0)  goes  to  0 as  n -*■  OT, 

where  the  expectation  and  variance  are  computed  under  any  0 in  A. 
WC4.  There  exists  a positive  6 such  that  for  any  0 in  A 


(cn(G))  E'  0*s.t.  |0*-O|  <6 


33logLn(0)  I > 


is  a bounded  function  of  n,  when  the  expectation  is  taken 
under  0. 


Bar-Shalom  [1971]  gives  conditions  in  terms  of  the  density  of 
individual  observations,  conditional  upon  all  past  observations,  under 
which  the  optimality  properties  of  Wald  [1948]  hold.  Let 

pi  E p(yi'>9) 

Pk  S p(yk/yk-l’ • • ' ’yl;9)  k = 

be  the  conditional  density  of  the  k'th  observation,  y^,  when  0 is  the 
value  of  the  unknown  scalar  parameter,  for  k >_  2 . The  unconditional 
density  of  y^  is  p^.  As  before,  assume  0 is  the  domain  of  the  unknown 
parameter,  and  6°  is  the  true  value.  The  following  regularity  condr- 
tions  must  be  satisfied  for  all  k: 

Bl.  91log  p , /310  exists  for  all  0f0,  for  i = 1,2,3. 

K 

B2.  E(9  log  p, / 3 0 j ) = 0,  where  the  expectation  is  taken  under  0^. 
k 0° 

B3.  J^CO0)  i E((3  log  p^/90)2)  <_  C1<  “,  where  is  independent 

of  k . 

B4 . E ( 9 2 log  pk/920  )|  Q = -Jk(0°) 

B5.  Thei’e  exists  a measurable  function  Hk(y^ , . . . ,yk ) such  that 
9 3 log  pk 


for  all  0([O,  and  is  finite,  except  on  a set  of  measure 

zero . 


< Hk(yr...,yk) 


6 


B6 . lim  Ei 3 lo-  pk  ‘ 

9 log  p 

1 k- j |-voo  \ ao 

39 

B7.  Var(32  log  p /320) 

< C9  ' 

ic 

ao  2 

and 

0 

/32  log  p. 

= 0 


<_  C2  < “,  where  is  independent  of  k, 


lim  co 
|k-j  |-«° 


t 


a2o 


a2e 


= o. 


Under  conditions  B1  through  B7 , the  nLE  of  0^  is  weakly  consistent. 
If  condition,  six  is  strengthened  to 
/ 9 l°g  Pv  9 log  p . \ 

B6'  • El  1 j = 0,  for  all  j i k. 

\ 30  30  j 

then  the  MLE  is  also  wide  sense  asymptotically  efficient. 

Bhat  [1974]  shows  that  the  conditions  of  Bar-Shalom  may  be 
simplified  and  a martingale  central  limit  theorem  invoked  to  imply 
consistency  and  asymptotic  normality  with  asymptotic  efficiency  in  the 
wide  sense.  (Wald  [1948]  refers  to  the  latter  property  as  asymptotic 
efficiency  in  the  strict  sense.)  Bhat  requires  that 

Bill.  The  range  of  the  k'th  observation  given  the  previous 
observations  does  not  depend  on  0.  The  derivatives 
up  to  third  order  of  p^  are  continuous  for  0CO. 
Differentiation  with  respect  to  9 up  to  third  order  of 
log  p^  may  be  carried  out  under  the  integral,  where  the 
integration  is  with  respect  to  the  observations. 

3 3 

BH2.  1 3 log  p^/3  ®|  is  bounded  in  probability  uniformly  for 
all  y1,...,yk  and  k. 

lim  n"1  l J,(0°)  = J ( 0 ° ) > 0 
n-*»  k=l 


BH3 . 
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BH4.  For  some  6 >0, 

n" 1-6/2  l e( | a log  P /ae|2+6/y,  -*■  o a.s. 

k=l 

BH5.  n T var( 3 log  p,/3z0)|  remains  finite  as  n ->  °°. 

k-«  K U 

= 1 0 

Condition  BH1  implies  that  3 log  L (0)/30  is  a martingale.  The  fact 
that 

n ry  ty 

l |V  log  Pk/3  e + E((3  log  Pk/30)  /y1,...,yk_1)3 
k—  1 

is  a martingale  is  also  required. 

Crowder  [1976]  also  makes  use  of  martingale  theory  to  prove  asymp- 
totic normality.  His  proof  of  consistency  does  not  involve  assumptions 
about  the  third  derivatives  of  the  log  likelihood  function.  To  show 
consistency,  he  defines  L^'(0,0^)  to  be  the  matrix  of  second  derivatives 
of  the  log  likelihood  function,  with  the  rows  evaluated  at  possibly 
different  points  on  the  line  segment  between  0_  and  9°. 

The  two  term  Taylor  expansion  of  the  first  derivative  of  the  log 
likelihood  is 


1.3  3 log  hn(£)/30j0  = 3 log  Ln(9_)/30|  Q + L*  ' (9_,0O)(£-e°> 

— 0 


For  weak  consistency  of  the  MLE  Crowder  require  1 that : 


Cl.  L^'  (£,9^  ) be  continuous  in  0_  throughout  some  neighborhood 


of  0U. 
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C2.  Uniform  convergence  conditions  apply  so  that,  when  the 

„0 

expectation  and  variance  are  taken  under  0_  , 

E(3  log  l (e_)/3e|  Q)  = 0 
0 

VarO  log  Ln(0L)/ae_|  0)  = E(-L”  (i°,£°))  = Bn 
0 

where  B is  assumed  to  be  positive  definite, 
n 


C3.  There  exists  A > 0 and  a positive  sequence  c^,  c^  ■*  00 , 

. 0,  . 

not  functions  of  0_,  such  that  1 0_-0_  | = 6 <_  A =? 


P (0-0°  )T  B'1/2  L"  (Q°,0)(6-0°) 

P|  s_ 


as  n •»•<*>. 


Assumption  C3  implies  that 


' (0°  ,0  ) 

n n — — 


goes  to  infinity  in  some  sense  as  n -*•  00 ; this  cox-responds  to  assumption 
WC2  of  Wald. 

To  prove  asymptotic  normality,  1.3  is  evaluated  at  the  MLE  0_(n) 
to  obtain 


0 (n)  - 0°  = -L"  (0°,0(n))-1B  B 1(3  log  L (0)/30)|  n 
— — n — — nn  n — — 


It  follows  from  assumption  C3  that 


B_1L’  ’ (0°,0(n))  -*•  I. 
n n — — k 
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stochastically,  as  n -*■  °°,  where  1^  is  the  k by  k identity  macrix. 
Crowder  utilizes  a martingale  central  limit  theorem  to  establish  that 


B~  v 3 log  Ln(0)/30)|  0 

0 


is  asymptotically  normally  distributed. 


1 . 3 Weiss 1 s Theorem  for  Asymptotic  Efficiency  in  the  Maximum  Probability 

Sense 

The  definition  of  asymptotic  efficiency  in  the  previous  section 
involved  the  requirement  that  the  variance  of  the  limiting  distribution 
of  the  estimator  attain  the  Cramer-Rao  variance.  As  Weiss  and  V.'olfowitz 
[1967]  point  out,  comparing  the  performance  of  estimators  by  their  vari- 
ances is  meaningless  for  many  distributions  other  than  normal.  Further- 
more, restricting  the  class  of  estimators  to  those  which  are  asymptotically 
normally  distributed  is  not  reasonable  from  a statistical  standpoint. 

’Weiss  and  Wolfowitz  [1967]  define  asymptotic  efficiency  to  overcome  these 
problems.  An  estimator  is  said  to  be  asymptotically  efficient  with  respect 
to  any  given  set  containing  the  true  parameter  value  if  it  maximizes, 
among  estimators  which  satisfy  a uniformity  condition  for  convergence, 
the  limiting  probability  of  being  in  the  given  set.  The  authors'  maximum 
probability,  MP,  estimator  is  asymptotically  efficient  in  this  sense: 
hence  the  estimators  enjoying  this  property  will  be  said  to  be  (asymptoti- 
cally) efficient  in  the  MP  sense. 

To  determine  conditions  under  which  MLE  for  various  econometric 
models  are  asymptotically  efficient  in  the  MP  sense,  Weiss's  [1973] 
theorem  will  be  used.  Before  stating  the  assumptions,  we  give  the 
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notation  used  by  Weiss.  0° 
of  0CR  . There  must  exist 
ties  ( n ) , . . . ,K^(n),  M (n) 


will  denote  any  given  vector  in  the  interior 
2k  sequences  of  nonrandom  positive  quanti- 
, ...,M^(n)  which  satisfy 


lim  K . ( n ) = <*> 
n-x» 


lim  M.(n)  = “ lim  M.(n)/K.(n)  =0  i = l,...,k. 

n-*»  n-+°° 


Define  rhe  closed  set 


Nn(6_°)  = {0£0:|0.-0°|  < M.(n)/K.(n),  for  i = l,...,k}. 


l — 


We  now  give  two  assumptions. 


WEI.  There  exist  nonrandom  continuous  functions  B..(0  ),  for 

il  - 


i,j  = l,...,k,  for  0 c 0 such  that 


, 3 log  L (0) 

1 n — 


K.(n>K.(n)  30.  30. 

il  il 


0 0 

converges  stochastically  as  n -*■  °°  to  B„(0_  )•>  when  0_  is 
the  true  parameter  value. 


n 3 log  L (0) 

Define  C ..(M°,n)  H ^K.(n)  ~ Se*  ~ 

il  11 


B.  .(0°) 
ij  - 


Let  »Y ) denote  the  set  of  observations  for  which 


k k 


I I M.(n)M.(n)  sup  |f..(0,0  ,n)|  < y. 
i=l  i=l  3 ,„0.  13 


0fN  (0  ) 

— n — 


The  second  assumption  is 


* 
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WE2.  There  exist  sequences  of  nonrandom  positive  quantities, 

{>  (n  ) } , {6(n,^°)}  such  that  y(n,0°)  0,  6(n,£°)  -*■  0 

and 

p(Rn(0°,  y(n,e°))  > 1 - 6(n,e°) 

for  all  6fN  (0_^)  where  the  probability  is  computed 

under  L ( 0 ) . 
n — 

Under  assumptions  WEI  and  WE2,  the  MLE  of  0 are  consistent,  asymptoti- 
cally normally  distributed  and  asymptotically  efficient  in  the  ME  sense. 

It  should  be  noted  that  the  assumptions  do  not  require  i.i.d.  observa- 
tions. Hereafter,  sequences  of  positive  quantities  which  converge  to 
zero  will  be  called  null  sequences. 

1.4  Sufficient  Conditions  for  Asymptotic  Efficiency  in  the  MP  Sense 
To  facilitate  the  use  of  Weiss's  theorem  we  prove  three  theorems 
which  imply  that  Weiss's  two  assumptions  are  fulfilled  in  many  econometric 
models.  Thus  the  MLE  for  the  econometric  parameters  have  the  optimality 
properties  we  have  discussed. 

Theorem  1.1  is  essentially  a weak  law  of  large  numbers  for  the  second 
partial  derivatives  of  the  log  likelihood  function.  However,  the 
stochastic  convergence  is  shown  to  be  uniform  in  any  decreasing  neigh- 
bohood  of  0.  (Recall  that  every  point  of  0 is  an  interior  point.) 
Theorems  1.2  and  1.3  imply  that  Weiss's  assumption  WE2  holds.  The  following 
notation  is  used  for  i,j  = l,...,k: 

D_(n,0*)  = -n_1(92log  Ln(0  )/30..3O_. ) 
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B./n,£'':)  = E(D„  (n,0;') ) whore  the  expectation  is  taken  under 
B..(o)  = lim  B..(n,0)  if  the  limit  exists. 

— i-j  — 

n-*» 

. . 1/2 
It  is  sufficient  for  all  our  models  that  K.(n)  = n for  i = l,...,k 

and  M.(n)  = M ( n ) for  i = Further  restrictions  on  M(n)  are 

given  in  theorem  1.3. 


Theorem  1.1 

Let  (e.(n)},  i = l,...,k  be  positive  sequences  which  each  go 


Assume  that  there  exist  two  null  sequences  (v._.(n,0^)}  and  {n^(n,£°)} 
such  that,  for  all  0_  in  A^(£°), 


|B..(n,0)  - B..(G)j  < v..(n,0u)  and 
' i]  — 13  - 1 - i3  — 


|Var(D„  (n,9^)|  < r).,(n,e  ) 


where  the  variance  is  taken  under  Then  there  exist  two  null 

sequences  {6 . ^(n, 0° ) } and  (6.?(n,0°)}  such  that. 

13  - 13  - 


P{  I D.  . (n,0 ) - B- .(0)|  > S.^n.O0)}  < «.?(n,0°)  for  all  GCA  (0°) 
13  — 13  ~ ~ 13  — — 13  — — n — 


where  the  probability  is  taken  under  0_.  Furthermore,  if  B._.(n,0)  is 

continuous  as  a function  of  0,  then  B..(0)  is  also  continuous. 

13  -- 


Proof: 


For  any  > 0,  Chebycheff's  inequality  implies 


13 


P{ | D. .(n,0)  - B. . (0 ) | > f } 

1 i]  - 13  - ' - n 

< f"2(Var(D. .(n,G)) + (B. .(n,0)  - B..(G))2) 

— n i3  — ij  — !J  ~ 

< C 2(n..(n,0°)  + v..(n,Q°))  for  all  0fA  (0°) 

— n 13  — 13  — — n — 


Let  C = max  (n  . . (n,0°)1//4,  v . . (n  ,0°  )1//4} 
n 13  — 13  — 


Then  <5t^.(n,0°)  =fn2(n  ^ (n  ,0°)  + v„(n,0°)) 

< fo.^n.e0))172  + (v  . .(n,0_°))1/2 


is  a null  sequence.  Furthermore,  6^(n,0^)  = C ^ is  also  a null 

sequence,  and  the  first  claim  is  proved.  To  prove  the  continuity  of 

B.  .(d),  we  note  first  that  the  continuity  of  B..(n,0_)  implies  the 

0 o 

existence  of  a null  sequence  (a  (0  )}  such  that  if  0,0'  in  A (0  ), 

n — n — 


B..(n,0)  - B..(n,0’)|  < a (0  ) 
13  — 13  — 1 — n — 


For  any  f >0,  choose  n so  large  that  2v^^(n,^)  + 0^(0°)  < r 

Then , for  0,0'  in  A ( 0 ° ) , 

’ — — n — 


!b_(0^)  - B.j(e_»  )|  £ |B..(0)  - B._.(n,0)|  + |B.^(n,0^)  - B^fn.e/)! 


+ IB. . ( n , 0 ' ) - B. .(O' ) 

'13  — 13  — 1 


Therefore  continuity  of  is  proven. 


Theorem  1.2 


For  A^(0  ) as  defined  in  theorem  1.1,  suppose  the  null  sequences 
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{5.^.(n,6')}  and  (5.?(n,0°)}  of  theorem  1.1  exist,  and  that 

i]  - 11  - 

^ is  continuous  in  0.  Then  there  exist  two  null  sequences 
(Y£j(n,£°)}  and  {y.. ? (n,0_°)}  such  that 


p'irV£0)  |rii(-’-°’n)|  — Y i j ( n * — 0 } j 


where  the  probability  is  taken  under  0,  for  all  O^A  (0^). 


Proof: 


We  can  express 


C . . (0 ,0°,n)  = D . (n,0 ) - B..(0)  + B..(0)  - B..(0°) 

13  — XT  11  ~ 11  — 11  — 


ror  all  Of  An(0_  ), 


|Dij(n,0)  - B„(£)|  < 6^(n,6°) 


o Q 

v.’ith  probability  (under  0_)  greater  than  1 - 6i^(n,£  ).  The  continuity 

of  B^  implies  that  there  exists  a null  sequence  (p„(n,0_°)}  such 

that  whenever  0 is  in  A (0°), 

— n — 


I B . (9  ) - B . . ( 0 0 ) | < p..(n,0°) 

11  11  — il  


Let  y^(n,0°)  = «5^(n,0°)  + bMn.e0)  and  Y^(n,0°)  = 5^.(n,0_°) 
Then  k (0,0°,n)|  < y*  (n,0°)  for  all  0 in  A (0°)  implies 

11  11  — TT  — 
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sup  |f..(6,6°,n)  < y — (n>0°) 

0OV  (G  ) 

— n — 

and  the  probability  under  0 of  the  first  event  is  at  least 

l-Y^-(n,0°)  for  all  0 in  A (0°).  Therefore  the  theorem  is  proven, 
i]  — — n — 

Theorem  1.3 

Assume  the  existence  of  null  sequences  (y.^Cn,©0)}  and 
2 0 

(Y^j(n,0^  )}  for  i,j  = l,...,k  of  theorem  1.2.  Then  there  exist  two 
X 0 o 0 

null  sequences  (y  (n,£  )}  and  (ytn,0  )}  and  a positive  sequence 

_i  /o 

{ M ( n ) } such  that  M(n)  ■*  «>,  n !i(n)  -*■  0,  and 


1 k k 1 

Pi  1 1 M2(n)  sup  n jr.  .(0,0°,n)  | > Y^n.fl3)  <_  y2(n,0°) 

[i=l  j=i  erN  (0  ) 13  ' 

I — n — 


where  the  probability  is  computed  under  0_,  for  every  0_  in  Nn(0_  )■ 


Let  K(n)  = minin^2  5,  min  {y . l(n,0°)  1'4}}  where  0 < 6 < 1/2. 

l<_i , j<k  13 

Then  M(n)  -*•  0,  K(n)  ^ M(n)  = n 2 n^2  ^ 0 as  n -*■  °°. 

By  assumption, 


P H2(n)  sup  I C-  .(B_,6°,n)  | > M2(n)y -1.  (n,0°)  < y?.(n,0°) 

0TN  (0  ) 13  13  13 

w — n — __ 


where  the  probability  is  taken  under  d_,  for  all  0 in  N^(0  ). 

Let  y^Cn,©0)  - 1 1 M2(n)yX(n  ,£°)  <_  1 £ (y  1 (n,0°) 

i=l  j=l  3 i-1  j=l  iD 

{y^(n,0  )}  is  a null  sequence.  Let 


Y2(n,0°)  - l l y2  (n,0°) 

i-1  3=1 


so  that 


lo 


2 0 

{y  (n , 0_  )}  is  clearly  a null  sequence. 

Let  E..  be  the  event  that  (M2(n)  sup  |r. .(6,0°,n) | _>  M2(n)y^ . (n ,0° ) } 

J orN  (eu)  13  13 

Then  - n - 


k k 


1 l l M2(  n)  supn  I f . . (e_, e°  ,n)|  _>  y 1(n,£0  ) 
! 1=1  j=l  6fN  (0  ) 13 

L_  “ n - 


k k 


P( 


k k 


U E )<  l V P(E  ) < l l y f .(n,0  ) = Y (n,£  ) 
i,j=l  3 i=l  j=l  13  i=l  j=l  13 


The  result  is  proved. 

VJe  note  that  to  apply  theorems  1.1,  1.2  and  1.3,  it  is  necessary 
only  to  verify  that,  for  all  i,j  = l,...,k,  B^(n,0)  is  continuous  as 
a function  of  0_,  B^^.(n,0_')  converges  as  n ->  00  uniformly  in  a compact 
set  containing  0°,  and  that  the  variance  of  D„(n,9),  computed  under 
0,  converges  uniformly  to  zero  in  a compact  set  containing  0^. 

1 . 5 Optimality  Results  for  Functions  of  the  Model  Parameters 

For  some  models,  it  is  convenient  to  prove  optimality  properties 
for  a parameter  set,  then  infer  that  the  same  properties  hold  for  an 
image  set  of  the  original  parameter  set.  This  is  convenient,  for 
instance,  when  the  error  variables  are  an  autoregressive  process.  It 
is  also  convenient  when  the  model  is  a linear  system  of  equations. 

Nocturne  [1970]  has  proved  a theorem  which  justifies  this  inference  when 
the  observations  form  a Markov  process.  We  will  prove  a similar  theorem. 
Let  the  mapping  which  defines  the  image  parameter  set  be 


H:  (01,...,0)<)  -*■  (H1(0 ),..., Hm(0)) 


1.4 


I 


r 


where  m _<  k.  We  want  to  prove  that  optimality  properties  which  hold 
for  _0  also  hold  for  H. 

Theorem  1.4. 

Suppose  a parameter  map  H is  defined  by  1.4.  If  m = k,  the  map 
is  one-to-one.  If  m < k,  then  there  exist  k - m functions 
such  that  the  map  8 } H is  one-to-one.  Assume  that  the  hypotheses 

for  theorems  1.1  and  1.2  hold  for  D„(n,£),  for  i,j  = l,...,k.  Assume 
also  that 


DO,-,  a2eD 

3hV  ’ 3H.3H. 

i i J 


exist  and  are  continuous  functions  of  H,  for  p,i,j  = l,...,k.  Then 
H(9(n))  is  the  MLC  of  H(£) , where  fl(n)  is  the  MLE  of  0_.  Further- 
more, _H ( 0 ( n ) ) is  consistent,  asymptotically  normally  distributed, 
and  efficient  in  the  HP  sense. 

Proof:  Since  H = is  bijective,  the  joint  density  of  the 

observations  may  be  written  as  a function  of  0^  or  as  a function  of  H, 
viz . 

L (0)  H L (H) 
n — n — 

It  is  clear  that  F[(0(n>)  is  the  MLC  of  H(0^.  To  verify  the  hypotheses 
of  theorems  1.1,  1.2  and  1.3  for 


\ 

\ 


D.  .(n,H)  = -n 

l]  - 


. a log  L (H) 
-1  & n — 

3H.3H. 
i 1 


lb 


we  note  that 


k 320  I 

Vn’»)=  X «rsrl- 

a=l  1 J j 


3 log  L ( 0)  j 
n - 


-1  2 

k k 30  30,  -n  3 log  L (o) 

+ y y -2i— k n , ~ . 

J.  £ 3H . 3H  30  36, 

a=l  d=1  i j a b 


The  expected  value,  under  II,  of  1.5  is 


k k 


B . . (n,H)  = y y (30  / 3H . ) (30./3H.)  B..(n,0) 
13  ~ a=l  b=i  a 1 b : 1]  - 


Since  the  hypotheses  of  theorem  1.1  hold  for  B..(n,0_)  and  the  first 
partial  derivatives  of  9 are  continuous  as  functions  of  6, 

— 3.  — 

E..(n,K_)  is  a continuous  function  of  H,  and  converges,  as  n -*•  <*>, 

^ -J 

to  B . j ( II ) . Furthermore,  the  convergence  is  uniform  in  decreasing 
neighborhoods  of  H(0^).  The  variance  of  D..(n,H)  is 

i] 


2 2 

k k 3 0 3 0 

1,6  % J.  aliTafir  30h7  cov 

a=l  b=l  l j l ] 


/ _x  3 log  Ln(6)  _x  3 log  L (0)  \ 
I -n  — ’ -n  


k k k k 30  30^  30  30 

+ J,  \ J-'dlTW:  SIT  3H7cov(Dab(n’£)’  Dcd(n’i}) 
3 — 1 b=l  c-1  d=l  i j i 3 


In  the  first  term  of  1.6,  the  covariance  term  is  less  than  or  equal  to 

Var(-n_1 3 log  L (0)/39  )1/2  Var(-n_1  31og  L (6)/30,)1/2. 

n — a n — b 

Under  the  appropriate  uniform  convergence  theorems,  we  have 


Var(-n  1 3 log  L (0)/30  ) = n_1  B (n,0) 
n — a aa  — 
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Since  the  second  partial  derivatives  of ' 0^  are  continuous  as 
functions  of  0 and  B (n,G)  converges  as  n uniformly  in 

a decreasing  neighborhood  of  0°,  the  first  term  of  1.6  goes  to  zero 
as  n 00 , uniformly  in  a decreasing  neighborhood  of  H(Q^). 

In  the  second  term  of  1.6,  the  absolute  value  of  the  covariance 
term  is  bounded  by 

Var(D  . (n,0))~/2  Var(D  (n,0))1//2 
ab  — cd  — 


By  assumption,  both  factors  converge  to  zero  as  n ->  »,  uniformly 
in  a decreasing  neighborhood  of  9°.  Since  the  first  partial  deriva- 
tives of  6 with  respect  to  H are  continuous  functions  of  0_,  the 
second  term  of  1.6  goes  to  zero  as  required,  uniformly  in  a decreasing 
neighborhood  of  H(0^C). 

We  have  shown  that  the  hypotheses  of  theorems  1.1,  1.2  and  1.3 
hold  for  D..(n,H)  for  i,i=l, . . . ,k.  Therefore  H(0(n))  is  con- 
sistent,  asymptotically  normally  distributed  and  efficient  in  the 
HP  sense.  The  theorem  is  proved. 


1.6  Pour  Useful  Lemmas 

This  section  contains  four  lemmas  which  are  required  in  the 
following  chapters.  These  lemmas  establish  the  convergence  of  cer- 
tain series  which  will  appear  frequently.  The  proofs  of  these  lemmas 
are  simple,  and  are  therefore  not  given. 


Lemma  1.1  If  jp|  < 1,  and  6 > 0,  then 


lim  n 
n-*» 


n n 

l l 


t=l  s-1 


0 


Lemma  1 . ? If  |p|  < 1,  and  6 > 0,  then 


lim  n 
n-*» 


-1-6 


n n 


l l |t-s|p 

t=l  S = 1 


t-s 


= 0 


Lemma  1 . 3 For  t = 1,2,...,  let  St(_0)  be  a continuous  function  of 

-1  n 

0_  over  a compact  set  C.  Let  T (£)  = n £ S^C^).  If 

t = 1 


lim  S (0) 

-£-Ko 


exists,  then 


lim  T (0) 
n — 

n-*°° 


exists.  Also,  convergence  of  T (0_)  as  n ->■  °°  is  uniform  in  C 
if  convergence  of  S as  t -*■  00  is  uniform  in  C. 


Lemma  1.4  For  k,m  = 1,2,...,  let  S,(0),  T (6)  be  continuous 
k — m — 

functions  of  0 over  a compact  set  C.  Suppose 


t 

lim  l I S.  (_0)  | = S(_0 ) < °°  for  every  £ in  C. 

t-~»  k=1 


and 

t 

lim  £ |T  (0)  | = T(£)  < » for  every  6 in  C. 

t-»”  m=l  n 


lim  l 
t-*°®  k=i 


t 

I 

m=l 


S.  (0)  T (0)u. 
k — m — km 


Then 


CHAPTER  II:  LINEAR  MODELS  WITH  LAGGED  DEPENDENT  VARIABLES  AS  REGRESSORS 


2.1  Results  from  the  Literature 


It  is  desired  to  determine  optimal  properties  for  the  MLE  of  the 
single  equation  model 


'a  yt  = J/sVg  * j.Wt.h  y et 


for  t=l  ,2 , , 


•..Mere  { et  ,t=l , 2 , . . . } are  independently,  normally  distributed,  with 
zero  expectation  and  unknown  variance  o^.  In  all  results  and  theorems 
in  :his  section,  the  stability  condition  v:ill  be  assumed  to  hold;  i.e., 
the  root  . of  the  polynomial  equation 


o ' - - . . . -Bq  = 0 


are  le'.:.  than  one  in  absolute  value.  This  assumption  assures  that  the 

O'  . illations  of  the  error  variables  and  exogenous  variables  are  not 

amplified  in  the  {y  ,t=l , 2 , . . . } . 

Koopmans,  Rubin,  and  Leipnik  [1950]  assume  that  the  variables 

(x  , ) are  nonrandom  and  that 
t,h 


-1  r 

n I . 

t,h  t+c,l 


converges  for  h,k  in  (l,...,H}  and  for  any  positive  integer  c. 


2 1 
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Under  these  assumptions  the  authors  show  that  the  MLE  of 
6 = ( 82’ ’ • • ’ are  cons^stont  anc^  that  their  asymptotic  distribu- 

tion is  jointly  normal  with  mean  vector  0_  and  covariance  matrix 
2 -1 

a N where  the  matrix  N may  be  written  as 


2.3 


M M „ 

yy  y° 


■;  m 

yO  00 


J 


M is  the  GxG  matrix  whose  h.k'th  element  is  the  stochastic  limit  of 

yy 


-l 


n 

1 

t = l 


-Vt-hyt-k 


h ,k=l , . . . ,G . 


M 


yO 

of 


is  the  GxK  matrix  whose  h,k'th  element  is  the  stochastic  limit 


-1 

n 


II 

I 


t=l 


yt-hXtik 


h=l,...,G;  k=l, . . . ,K 


and  M is  the  KxK  matrix  whose  h.k'th  element  is  the  limit  of 


-1 

n 


l 


t = l 


Xt,hXt,k 


h,k=l , . . . ,K. 


The  MLE,  3(n),  of  £ are  shown  to  be  efficient  in  the  sense  that  the 

1/2  A 

asymptotic  concentration  ellipsoid  of  n (6(n)-B),  which  is  the  set 
(e:  e ii  e < o },  is  contained  in  that  of  every  other  consistent 
estimator.  Normality  of  is  not  necessary  for  the  proofs  of 

consistency  and  asymptotic  normality  of  the  MLE.  This  condition  may 
be  relaxed  to  the  condition  that  the  error  distributions  have  zero 
means  and  finite  moments  of  every  order.  However,  the  normality 


- 
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assumption  is  required  for  the  efficiency  result. 

Theil  [1971]  also  considers  the  linear  model  with  lagged  dependent 
variables.  For  consistency  and  asymptotic  normality  of  the  estimators, 
the  stability  condition  2.2  is  assumed.  The  convergence  of 

(n_C)  tl±  Xt,hxt+c,k  h,k=l, . . . ,K 

is  required,  for  c=0,l,...,G.  (Koopmans,  Rubin,  and  Leipnik  [1950] 
require  this  convergence  for  all  positive  integer  c.)  The  limit 
matrix  of 


-1  r T 

n X X-tXf 


t=l 


which  is  Mqq  in  the  notation  of  Koopmans  et  al.,  is  required  to  be 

positive  definite.  Assuming  that  the  error  terms  are  independent, 

2 

identically  distributed  with  zero  mean,  positive  variance  0“  and 

finite  moments  of  every  order,  and  regarding  the  presample  values 

y ,y  . . . ,y  as  fixed,  the  least  squares  estimators  of  the  parameters 
U “1  5 J_~0 

are  shown  to  be  consistent  and  asymptotically  normally  distributed 

2 -1 

with  zero  mean  and  covariance  matrix  equal  to  0 N , where  N is  as 

defined  by  term  2.3.  If,  in  addition,  {e.,...,e^}  are  normally 

distributed,  and  if  b is  any  other  consistent  estimator  of  the 

pai^ameters,  then  the  covariance  matrix  of  the  limiting  distribution 
2 -1 

exceeds  0 N by  a positive  semidefinite  matrix.  The  proof  of  this 
result  is  not  given  in  Theil  [1371]. 

Uocturne  [1970]  considers  the  model  2.1  where  the  exogenous 


■ 


variables  are  no  longer  considered  as  constants,  but  as  a stationary 
stochastic  process  independent  of  the  error-  terms,  such  that 


2r> 


|cov  (xt1h,  x^  k)j  < Cpc  for  i,j=l,2,3,4,4+6; 

5 ’ for  c=0,l,2, ...  ; 

for  h,k=l, . . . ,K; 

where  C > 0,  0 < p < 1,  and  5 > 0.  Stationarity  implies  the  marginal 
distributions  of  x are  identical  for  each  t.  If  the  marginal 
distribution  is  degenerate  with  mass  one  at  e. , then  e =e,  for  all  t. 

l.  L 

That  is,  if  the  exogenous  variables  are  nonrandom,  they  must  be  constant. 

The  error  terms  are  assumed  to  bo  independently,  normally  distributed 

2 

with  expectation  0 and  unknown  variance  a • Uocturne  also  assumes 
that  the  process  (y  , t=l,2,...}  is  stationary,  which  implies  the 
stability  condition  2.2.  Under  these  assumptions,  the  MLE  of  the 
parameters  are  shown  to  be  consistent,  asymptotically  normally 
distributed,  and  efficient  in  the  MP  sense. 

Other  authors  have  dealt  with  various  modifications  of  2.1. 

Mann  and  Wald  [1943]  show,  for  the  model  2.1  with  no  exogenous  variables, 
that  the  MLE  are  consistent  and  asymptotically  normally  distributed, 
when  the  stability  condition  holds  and  the  error  variables  are  independently 
distributed  with  zero  expectation  and  all  other  moments  finite.  Grenander 
and  Rosenblatt  [1957]  also  consider  model  2.1  with  no  exogenous  vari- 
ables. Assuming  that  the  stability  condition  holds,  the  error  variables 
are  independently,  identically  distributed  with  expectation  zero  and 
the  next  three  moments  finite,  and  the  (y  } are  second  order  station- 
ary, the  quasi-MLE  (that  is,  the  MLE  computed  under  the  assumption  of 
normality  of  the  errors)  are  shown  to  be  consistent  and  asymptotically 
normally  distributed. 


•: 


i 


J 
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Durbin  [1960]  derives  a different  criterion  of  optimality  which 
the  MLE  of  the  parameters  of  2.1  satisfy.  If  an  estimator  b of  a 
scalar  parameter  6 is  defined  by  the  linear  equation 

2.4  T b + T2  = 0 

where  and  are  functions  of  the  observations  such  that 

^2^1  :''s  ^-n<^ePenden't  the  unknown  parameters,  and 

E(T16  + T2)  = 0, 

then  2.4  is  called  an  unbiased  linear  estimating  equation.  An  unbiased 
linear  estimating  equation  t^+t^O,  with  E(t^)=l  and 

var(t16  + t?)  < var(s^B  + s2> 

for  all  other  unbiased  linear  estimating  equations  s^b+s„=0,  with 
E(s^)=l,  is  called  a best  unbiased  linear  estimating  equation.  Durbin 
derives  a lower  bound  to  var(t^8  + t0)  similar  to  the  derivation  of 
the  Cramer-Rao  lower  bound.  He  also  shows  that  if  the  log  likelihood 
function  is  a quadratic  function  of  the  unknown  parameters,  then  the 
likelihood  equations  are  best  unbiased  linear  estimating  equations.  In 
this  chapter  v/e  shall  prove  the  optimality  properties  of  Nocturne. 
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2.2  Opt imality  Properties  for  the  MLE  when  the  Exogenous  Variables  are 
Monrandom 

We  will  consider  first  the  univariate  model  2.1  where  the  exogenous 
variables  are  assumed  to  be  nonrandom.  The  assumptions  are  similar 
to  those  of  Koopmans,  Rubin,  and  Leipnik  [1950],  but  the  results  are 
stronger,  since  efficiency  in  the  MP  sense  is  stronger  than  efficiency 
in  the  sense  of  having  the  smallest  asymptotic  concentration  ellipsoid. 

Theorem  2 . 1 

For  model  2.1,  assume  the  error  terms  are  independent  and  normally 

2 

distributed,  with  expectation  zero  and  positive  unknown  variance  a . 
Assume  that  the  stability  condition  2.2  holds  and  that  the  exogenous 
variables  are  nonrandom,  satisfying 
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Ln(6,o")  = ^2*S)-1/2  cxp{~(2a/)'X(yt-  ^gYt.g  'JeG+hXt,l/} 


2.-1, 


The  second  derivatives  of  the  log  likelihood  function,  with  respect 
to  the  parameters  to  be  estimated,  are: 


2.5  -n  ^ 3“  log  L^CS^a^) 

38.  36. 
i 1 


= (n  a2)-1  l y y 

t=l  “ L r J 


i,j=l, . . . ,G 


_ 3 2 log  L ( 8_,a2 ) n 

2.6  -n  —vo vt = (no  ) ' £ x .x 

36Gti  3BGtj  t=l  tjl 


i,  j=l,. . . ,K 


. 32  log  L (3,a2)  . R . n 

2.7  -n ^r-TT = (-2o  ) + (no  ) \ e 

(3o2)2  t=l  t 


2.8  -n 


* log  Ln(B,o‘) 


<n  <’2rlJ1xt,iyt-i 


36.  33  . 

l G+3 


i=l, . . .G;i=l, . . . ,K 


„ „ -1  32  102  Ln(i’°2)  , 4,-1  ? 

2.9  -n  — = (no  ) l e y . 

3o  38.  t=l 

1 


i=l, . . .G 


2.10  -n 


^ 3"  log  Ln(3,o) 


= (no4)-1  l ex 
t=l 


3a  38_  . 
G+j 


3=1, ... ,K 


It  is  necessary  to  show  that  the  expected  value  of  2.5  through  2.10 
converge  to  functions  of  the  parameters  which  are  continuous,  and  that 
the  variances  are  bounded  above  by  terms  which  go  to  0 as  n -*■  ®, 
uniformly  for  values  of  the  unknown  parameters  in  anv  compact  set. 

First  we  will  undertake  the  proofs  for  term  2.5.  The  variables 
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4 (t)  = 6,4  (t-1)  + 6„4  ( t-2 ) + ...  + 6^  4 (s) 

S xS  z s t-s  s 

4^(t)  = 1,  i'^(t-h)  = 0 for  all  h > 0. 

Using  the  last  restriction  we  may  write  the  above  system  as 

4,(t)  = 6.4  (t-l)  + 6.4  (t-2)  + + 6-4  (t-G) 

S X S z S (j  s 

for  0 <.  s < t , where  the  4's  satisfy,  for  s=l,...,t. 


4 (s)=l,  4 (s-l)-O, 
s s 


,4  (s-G+l)=0 

s 


>(0)=v*o 


( -1)=0, 


,40(-G+D=y_G+i. 


The  solution  to  this  system,  of  difference  equations  is 


<5> 

s 


(t)  = P 


Is 


+PHs(t)P 


t 

a 


for  CKs<t 


where  p,,...,p„  are  the  distinct  roots  of  2.2  and  P.  (t)  is  a 
11  is 

polynomial  in  t whose  degree  is  one  less  than  the  multiplicity  of 
the  root  p^.  These  polynomials  are  determined  uniquely  from  the 
initial  conditions  and  the  multiplicity  of  the  roots.  Evaluating  the 
polynomials  at  the  initial  conditions,  we  have 


1 = 0 (S)  = l P.  <s)p? 


i=l 


is  1 


0 = 4 (s-j)  = l P.  (s-j)p 


i=l 


is 


j =1 , . . . ,G-1 
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s:  S-l 

Thus  P^g(t)  = P^(t-s+l )/p^  is  a polynomial  of  the  same  degree  as 

P.  (t)  and  also  satisfies  the  initial  conditions.  Therefore  we  must 

lo 

have  P.  (t)  = P.  (t)  and 
is  is 

e, 

2.12  $ (t)  = j P.  (t-s+l)p^  5+1  for  0 < s < t. 

s . ll  i 

i=l 


In  the  computations  which  follows,  we  will  assume  without  loss  of 

generality  that  all  G roots  are  distinct,  and  therefore  P^(t-stl) 

has  degree  zero  and  is  not  a function  of  t-s.  Therefore  let 

A . =P . . (t-s+1 ) for  i=l,...G.  The  Parameters  A.  and  p.  are  con- 
l ll  - ii 

tinucus  as  functions  of  ^ ,32 , . . . ,&G,  y0,y_15 . . . ,y_G+, . When  the 
dependence  on  is  to  be  emphasized,  the  parameters  will  be  written 
as  A^(^)  and  P^(B). 

The  proof  that  the  expectation  of  2.5  converges  follows. 

Assuming  that  i >.  j , 


2.13  E(n 


1 1 yt  iYt  = n 1 t y .y  . + n 1 l y .E(y  _. ) 
t=l  X 1 X 3 t=l  U_1  t_]  t=j-:-lt  1 t_J 


+ n 1 I l l * (t-i)*  (t-j)  Eft.*) 


n t-i  t-] 


t=i+l  k=0  m=0 


km 


The  first  term  is  not  a function  of  6 and  goes  to  zero  as  n -*■  00 . 

The  summand  of  the  second  term  of  2.13  is  bounded  above  by  a continuous 
function  of  and  therefore  the  second  term  goes  to  zero  uniformly 
for  (B,o  ) in  a compact  set.  To  evaluate  the  third  term,  we  calculate 


■ 
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2.14 


E(W  = ( ^G^,a)(j/Gtb\,b)  + 6kr/  k ’m>° 
a=l  b~l 


K 

1 ^G+aXk , a 


k>0 ,m=0 


a=i 


k“m=0 . 


So  the  third  term  of  2.13  equals 


n n t-j  K 

2*15  n l yt-i)*0(t-j)  + n” 1 l yt-i)  l « (t-j)(  l eG+axtn  ) 

t=itl  t = i+l  m=l  a=l 


:-i 


+ n~  l *n(t-j)  l ®v(t-i)  ( l 8G+axk  a) 


0 J ' L 

t=i+l°  k=l  k 


a~l 


n t-i  t-j 


\ l ryt-i)yt-j)  | ( ! BG+axk  )(  [ 6G+bx  b)  +5^0 

t=i+l  k=l  m=l  ! a=l  b=l 


where  6,  is  1 if  k=m  and  is  otherwise  0.  The  convergence, 
km 

uniformly  in  B,  of  the  second  and  third  terms  of  2.15  may  be  proven  in 
the  same  manner  as  the  following  proof  for  the  convergence  to  zero  of 
the  first  term. 


n G G n t-i+1  t-j+1 

2.16  n"  l 0 0(t-i)*0(t-j)  = l l * Mn  l p pd  ) 

t=i+l  c=l  d=l  t=i+l 


/ G G v 

= n_1  ( l J Xc  Xd  Pd’]C(pCPd)2  " (PC  Pa>n"1  + "]/(1-Po  pd> 
\c=l  d-1 


2 

which  converges  uniformly  to  0 for  ) in  a compact  set,  since 

is  less  than  one  in  absolute  value,  for  k=l,...,G,  and  since 
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A,  and  p,  are  continuous  functions  of  6,  for  k=k,...,G. 
k k 

It  remains  to  show  that  the  last  term  of  2.15  converges  uniformly 


in  That  term  may  be  written  as 


G G 


K K 


\ J /c  Xd  l l iSG+a  'G+b 
c=l  d=l  a=l  b=l 


-1  V V ty’’  t-i-k+l t-j-m+1 

n ) ) > p P , x.  y~  , 

V L u c d k,a  m,b 

t=i+l  k-1  m=l 


+ v v > , / 2 -1  v tv1  t-i-k+1  „ t-j-k+I) 

l l Ac  Ad(°  n l l Pc  Pd 

c=l  d=l  t=i+l  k=l 


The  last  term  of  2.17  converges  uniformly,  for  B in  a compact  set,  to 


G G 

a2  l ?A  X,  p;"j+1  P / ( 1-P  P-). 

,L,c  d d c c d 

c=l  d=l 


Assumption  2 implies  that  the  first  term  of  2.17  converges  uniformly. 
The  factor  of  this  term  in  brackets  may  be  written  as 


n-i  k+r-1  ,n-i-k 

r r i.  , . / -1  v 


k=l  m=l  c d q-0 


n-1  n-i-k  , ...  n-i-m-k 

t n „k_ m+k+r  , -1  r 


V v „k  m+K-t-r  , -u  y i 

+ ^ ^ c d n “ q+m+1 ,b  q+1 ,a 

k=l  m=0  q=0 


We  will  show  that  the  first  term  is  a Cauchy  sequence  in  n.  Denote 


by  S(n-i,m)  the  term 


_in~1 

n o?0Xq+l,aXq+m+l,b 


Choose  n,p  sufficiently  large  so  that,  for  all  m, 

| S(n-i,m)-S(p-i,m)|<e.  Denoting  the  first  term  of  2.18  by  T(n),  we 


p-i  k+r-1  . 

|T(n)-T(p)|  < [ l |p  |“  |p 

T 1 n ' 


(e  + X k/n  + X k/p) 


k=l  m=l 


0 n-i  k+r-1 

+ X I 1'  !»/  Ip/""" 


k=p-i+l  m=l 


where  X is  defined  in  assumption  1.  It  is  clear  that  this  upper 
bound  goes  to  zero  as  n,p  -*■  °°.  Moreover,  the  convergence  of  the 
first  term  of  2.18  is  uniform  for  _8  in  a compact  set,  since  p,,p^ 
are  continuous  functions  of  6.  A similar  argument  shows  that  the 
second  term  of  2.18  converges  uniformly  for  B in  a compact  set. 

We  will  now  show  that  the  variance  of  2.5  goes  to  zero,  as 

O 

n •+  00 , uniformly  for  6,0^  in  a compact  set.  For  i > j, 


(ri2a4)  1 l l cov(y  .y  , y .y  .) 
t=l  s=l  J & 


(n  o ) £ £ y .V  . cov(y  .,  v .) 

_ . , U.  , t-1 ■ S-l  t-]  - S-] 

t=T+l  S=1+l  J J 


:j+l  S=j+1 


2 4 — 1 p r* 

0 ) l L : cov(y  y .y  .) 


l n 


t=j+l  s=i+l 


t-r  t~3  J s-i  s~3 


2 4-1  v V 

+ (n  0 ) l I co v(y.  .y^  .,  y .y  . 

_ . , . , -t-iJt-i  JS-lJS-] 

t=x+l  s=i+l  J J 


n n 


The  first  two  terms  go  to  zero  as  required  since  each  term  of  the  double 

2 

summand  is  bounded  in  absolute  value  by  a constant,  for  B,0 


m a compact  set.  The  uniform  convergence  to  zero  of  the  last  term 
remains  to  be  shown.  The  last  term  may  be  written  as 
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j 

i 


n t-i  t-j  s-i 

l l l l 

= i+l  p=0  q-'O  r~0 

{E(<j>  $ <p  ) - E(<p  <p  )E(<ji  <j>  )} 
p q r v P 4 r v 

We  can  evaluate  the  term  in  brackets  in  2.20  as 

E4p6q4'r'Pv)  “ 


4 2.2.2 


=0  +o(m  +m) 

if 

p-r,  q=v , p^q 

p q 

or 

if  p=v,  q=r , p^q 

2 

= a^m  m 

q v 

if 

P=r,  q*v,  q/pj^v 

2 

= a m m 
q r 

if 

p=v,  q^r , q^p^r 

2 

= a m in 

P r 

if 

q=v,  p^r,  ptqfr 

2 

= o m m 
P v 

if 

q=r,  p^v,  p^q^v 

„ 2 

=2a  m m 
r p 

if 

p=q=v , r^p 

2 

= 2a  m m 
v p 

if 

p=q-r , v^p 

2 

= 2a  m m 

if 

q=r=v , p7iq 

p q 

„ 2 

= 2a  m m 

p q 

if 

p=r=v,  q?!p 

4 2 2 

= 2a  Vw  m 

if 

p=q=r=v 

P 

= 0 

otherwise . 

K 

where  ) = m = j 3^  x 

P P G+a  p,a 

for  p >,  1. 

The  terms  involving 

<j>0  = 1 may  be  ignored,  since  they  are  bounded  above  by  a continuous 
2 

function  of  (^3, o'-)  which  goes  to  zero  as  n -*■  00 . We  will  prove  the 
same  is  true  for  each  of  the  above  terms,  assuming  without  loss  of 


2.20  (nW1  f 

t=i+l  s 


s-j 

J <i>  (t-i)4>  (t-jH  (s-i)<J>  (s-j) 
v=0  p q r V 


t 


i 


> 


4 
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generality  that  t >.  s and  j _>  i.  Let  A,B,C,D  represent  distinct 

elements  of  {t-i,t-j ,s-i,s-j } . A and  B are  said  to  be  of  the  same 

type  if  A,B  are  in  the  set  {t-i,t-j}  or  the  set  {s-i,s-j}.  The 

sums  involved  are  either  single,  double,  or  triple  sums  over  p,q,r 

and  v,  since  cov(<t  <(>  ,<■>  <J>  ) = 0 unless  at  least  two  indices  are 
p q r v 

equal.  Let  f(S,X)  be  an  upper  bound  for  each  of  the  above  terms, 

2 

when  (J$, o")  are  in  the  compact  set  S.  X was  defined  in  assumption  1) 
as  an  upper  bound  for  the  exogenous  variables. 

We  will  note  first  that 


lyt-Di  < c f i a |]Pt-i"P+1 

1 c=l  c 

where  p = max{p^,p0 , . . . ,pQ} , and  p is  a continuous  function  of 
The  single  summand  is  of  the  form 


1$  (t-i)$  (t-j)4>  (s-i)$  (s-j)  f (o?,B,x) 
psj_P  P P P P 


where  f is  a continuous  function  of  its  arguments. 


is  bounded  above  in  absolute  value  by 


This  summation 


G 

f(S,X)  ( l 
c=l 

which  is  bounded  above  by  a continuous  function  of  the  parameters  and 

exogenous  variables  times  p1"  s. 

The  form  of  the  double  summand  is 
A B 

l l 0 (AH  (BH  (CH  (D)  f (a\S,x) 
p = l q=  1 P q r v pq 


|ACD 


« 4 4ts-5  + 2K.,  4,-1 

p J (p  -p  J )(l-p  ) 
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where  f is  a covariance  term.  Hence  it  is  bounded  above  by  f(S,X). 
Assume  first  that  A,B  are  of  the  same  type.  Then  C,D  are  of  the 
came  type.  Consider  first  the  case  where  r=v=p,  which  will  be  argued 
identically  to  the  case  r=v=q,  by  symmetry.  In  this  case  the  summand 
equals 


A B 


l l * <A)*„ (B)«  (C)«  (D)  f (oMl.x) 


P=i  q=i  p 


pq 


1 f(S,X)  ( l | A | )4  p4(l-p)'1  (1-p3)' 


c=l 


C and  D are  both  of  the  type  opposite  from  B,  and  C > B,  D > B.  Since 
t >_  s and  j >_  i , either  t-s>j-i,  which  implies  t-i>t- j >s-iis- j , or 
t-s<j-i,  which  implies  t-i>_s-Lst-i*s- j . Therefore  we  must  have  C,D 
in  {t-i,t-j } and  B in  {s-i,s-j},  so  that  C+D-2B  = 2(t-s)-(j-i)  if 
B=s-i  and  C+D-23  = 2(t-s)+(j-i)  if  B=s-j.  So  an  upper  bound  for 
the  summand  is 


G 

f(S,X)(  l | X j )4p4(l'P >_1(1 
c=l 


3.-1 
-P  ) 


t-s 

P 


How,  with  A,B  of  the  same  type,  suppose  r and  t are  unequal,  and, 
without  loss  of  generality,  p=r  and  q=t,  which  implies  C > A and 
D ^ B.  Then  an  upper  bound  for  the  absolute  value  of  the  summand  is 


f(S,X)(  Z|xJ)4p4(l-p^) 


2.-2 


c=l 


C-A+D-B 

P 


The  restrictions  C > A,  D > B,  A and  B of  the  same  type  imply  that 


A,B  are  in  {s-i,s-j}  and  C,D  are  in  {t-i,t-j},  which  implies  that 


' 


C-A+D-B  2(t-s ) (t— s ) 

P < P £.  P 


So  an  upper  bound  for  the  absolute  value  of  this  summand  is 


f(S,X)(  l 1 \ | )4  p4(l-p2)  2 Pt-S 


c=l 


Nov;  the  double  summand  with  A,B  of  different  types  will  be  con- 
sidered. Suppose  'without  loss  of  generality  that  0 is  of  the  same  type 
as  A and  D is  of  the  same  type  as  B.  If  r=p  and  v=q,  then 


cov(<(>^4>^,  <j>  <j>  ) = for  pAq,  so  this  term  does  net  appear  among  the 


covariance  terms.  Now  suppose  that  r=q  and  v=p,  which  implies 
C > 3 and  D > A.  Then 


A 3 2 

l l * (A)®  (D)<X>  (D)®  (C)  f (a  ,B,x) 

~ p q q pq  — 


p=i  q=i  p 


.4  4.,  2.-2  C-B+D-A 


< f(s,x)(  l |a  |;  p4(i-pV  Pl 

c— 1 


This  term  appears  only  when  t-i>s-i>t- j>p-j , or  j-i>t-s,  and  in  this 
case  C-B+D-A  = 2(j-i)  >_  t-s.  Therefore  an  upper  bound  for  the 
absolute  value  of  the  summand  is 


f(s,x)(  j?  |x  |)4  p2(i-p2)'1  pP  s 

C — 1 C 


Retaining  the  assumption  that  A,B  are  of  different  types,  suppose 
p=r=v,  and  C>A,  C>p.  Then 


* 


1 
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A B 


1 I * (A)$  (C)$  (D)$  (B)  f__(o  ,{5,x) 
L.  d d p a pa  — 


p- 1 q=l  ^ 


< f(S,X)  ( l | X I)4  p4(l-p)'1(l-p3)"1  pD+c  2A 
c=l 

If  t-i>.t-j>!s-i>.s-j  (i.e.,  if  t-s>j-i>p)  then  the  restriction  that 

C>A  and  D>.A,  C and  A of  the  same  type,  implies  C=s-i  and 
A=s-j.  If  D=t-i  then  D+C-2A  = t-s  + 2 ( j - i ) >_t-s. 

If  D=t-j  then  D+C-2A  = t-s  +j-i>_t-s.  If  t-i>s-i>t- j>s-j  (i.e.,  if 
0<t-s<j-i)  then  there  are  several  possibilities.  First  let  C=s-i 
and  A=s- j . If  D=t-i,  then  D+C-2A  = t-s+2(j-i)  > t-s.  If  D=t-j , 
then  D+C-2A  = t-s  + j-i>_t-s.  Secondly,  let  C=t-i  and  A=t- j . 

This  implies  that  D=s-i  and  B=s- j , so  that  D+C-2A  = j-i  + s-i  - 
( t — j ) j-i  _>  t-s.  Having  exhausted  all  possibilities  we  may  conclude 

that  an  upper  bound  is 


f(S,X)(  l ! x 1 ) 4 p4(l-p)-i(l-pJ)'1p 


-1/ 


.3 ,-l  t-s 


c=l 


This  ends  the  analysis  of  terms  which  are  double  summands. 

Now  let  us  consider  terms  which  are  triple  summands.  These  terms 
are  of  the  form 


ABC 

I l l * (A)*  (B)«  (C)*  (D)  f (°  ,e,*) 
p=l  q=l  r=l  H 1 y 


If  A,D  are  of  the  same  type,  f (a^,R  x)  = cov(f  , <{)  ({>  ) - 0 when 

pqr  P p q 

p/^q^r.  If  the  summand  is  to  be  nonzero,  D must  be  of  a different  type 
than  A.  In  this  case  an  upper  bound  of  the  absolute  value  of  the 
summand  in 


4 


f(S,X)(  l |x  |)4  p4(l-pt)'1(l-pj'2  p1 
c=l 


This  completes  the  determination  of  upper  bounds  for 
cov  (y  .y  . ,y  .y  .)  with  t >_  s and  j > i.  We  may  conclude, 

t X L “ * J S” i S ^ 

2 

by  applying  Lemma  1.1,  that,  for  J3,cr  in  a compact  set  S , 


n n 


l l cov(y  y y y . 
t=i+l  s=i+l  x i r ] sis] 


S . . 4 4 0-1  _o  -2  - S |f-o| 

< const.  ( l | X | ) p (1-p  ) (1-p)  (n  l £ P1 ' 1 ) 

c=l  ° t=i+l  s=i+l 


which  goes  to  zero  as  n -*•  »,  uniformly  for  8, a in  a compact  set  S. 

Next  we  will  show  that  the  expected  value  of  2.6  converges  as 

2 

n + “ uniformly  for  £,ff"  in  a compact  set.  Assumption  2)  implies 

this.  Since  the  {x  .}  are  nonrandom,  the  variance  of  2.6  is  zero. 

~ •>  J 

4 - 1 

For  term  2.7,  we  determine  that  its  expected  value  is  (2a  ) , 

2 

which  is  not  a function  of  n,  but  is  clearly  continuous  in  o . The 

6,-1  2 

variance  of  2.7  is  3(n0  ) , which  goes  to  zero  uniformly  for  o in 

a compact  set . 

For  term  2.8,  we  calculate  the  expected  value  as 


2.22  (no 


) 1 l x .y  . + (no2)'1  [ x .(<Mt-i)  + l \(t-i)n  ) 

t=l  t’rt'1  t = i+l  ,:)  k=l  * k 


where,  for  k > 1,  m,  = V 0 x = E(4>,  ).  Since  the  first  term  of 

K (jTd  K , a K 

. a_1"  -i 

2.22  is  a sum  of  constants  times  n , it  goes  to  zero  uniformly  in 

2 

6,o  . The  second  term  may  be  written  as 
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G K 


2 23 


n-i  . . n-i 

V r 

) n i ■n 


°~“  l,  XcW^.pcCn"  ^ Vi,jXt-k+l,aJ: 
c=l  a=l  k=1  t=k  J 


G 


+ Z xc  t(no2)  1 l x 
c=l  t=i+l  ’n  " 


n 


t-i+1- 


By  assumption  1,  the  second  term  of  2.23  goes  to  zero  uniformly  for 

2 . . 

J3, a in  a cot. pact  set.  Using  assumption  2)  and  arguments  similar  to 

those  used  for  term  2.18,  it  is  possible  to  show  that  the  first  term 

of  2.23  converges,  as  n ",  uniformly  for  8 in  a compact  set. 

Therefore  we  conclude  that  the  expected  value  of  2.3  converges  in  a 

similar  manner. 

V?e  must  also  show  that  the  variance  of  2.8  converges  uniformly  to 


zero.  The  variance  of  2.8  is 


2.24 


(n2ou)  1 l l x^  . x . cov(y  .,  y .) 


n n 


t=i+l  s-i+1 


t,]  s,]  t-i’  s-i 


S — j 


For  t > s,  cov(y.  . ,y  .)  5 o‘  £ 4>  (t-i)3>  (s-i) 

— L — 1 S - 1 , , K K. 

k-1 


„ 2/  V 1 1 I \u  2,_  2.-1  t-s 

5.  o ( l I I ) P (1-P  ) P 

c-1 


Therefore,  by  Lemma  1.1,  term  2.24  goes  to  zero,  uniformly  for  B,o 
in  a compact  sat. 

To  compute  the  expected  value  of  2.9,  note  that  y ^ is  a sum  of 

error  terms  for  k < t-i.  Since  the  error  terms  are  independent, 

k 

the  expected  value  of  2.9  is  zero.  The  variance  of  2.9  is 


r 


— 


1 


2.25  (n2c8)-1  l yi  . o'  + £ $_(t-i)V 

t=l  t_1  t=i+l  ° 


2 2 


n 


,\2_2 


n t-i  „ 

+ 7 2#  (t-i)  7 $ (t-i)o  m 

t=i+l  ° P=1  p p 

n t-i  t-i 


+ l l l 4>  (t-i)4>  (t-i)[C . o4  + m,.  cT] 


t=i+l  k=l  p=l 


kp  k p 


L2 


which  goes  to  zero  uniformly  for  g,o  in  a compact  set  since  2.25  is 

2 -1 

bounded  above  by  a continuous  function  of  6, a times  n 

It  is  clear  that  the  expected  value  of  2.10  is  zero  and  the 
variance  of  2.10  is 


, 6,-1  , -1  c 2 , 

(no)  (n  > x . ) 

t=l  ^ 

Since  the  exogenous  variables  are  bounded,  the  variance  goes  to  zerc 

2 

as  n 00 , uniformly  for  o in  a compact  set. 

Since  the  assumptions  for  theorems  1.1  and  1.2  are  satisfied, 
theorem  1.3  may  be  applied  to  conclude  that  the  MLE  of  (8,c")  are 
consistent,  asymptotically  normally  distributed,  and  asymptotically 
efficient  in  the  MP  sense.  The  asymptotic  covariance  matrix  is  the 
matrix  of  the  stochastic  limits  of  2.5  through  2.10. 

2 . 3 Optimalitj'  Properties  for  the  MLE  when  the  Exogenous  Variables 
are  Random 

Asymptotic  properties  for  the  MLE  for  models  with  lagged  dependent 
variables  may  also  be  derived  when  the  exogenous  variables  are  assumed 
to  be  random.  Nocturne's  assumption  that  {y  ,t=l,2, . . . ) is  a stationary 


x 
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process  and  that  the  covariances  of  the  third  and  fourth  powers  of  the 
exogenous  variables  a^e  decreasing,  exponentially  with  their  difference 
in  tine,  are  not  necessary.  Our  assumptions  regarding  the  exogenous 
variables  involve  powers  no  higher  than  two. 

Theorem  2 . 2 

Tor  the  model  2.1,  assume  the  exogenous  variables  {x  ,t=l,2, . . . } 
are  identically  distributed,  independently  of  the  error  sequence 
(e^,t=l,2, . . . },  which  is  itself  normally  distributed  with  mean  0 and 
variance  a~ , independently.  Assume  that  the  stability  condition  2.2 
holds.  Assume  also  that 


1)  icov(xt^,xs5!n)|  = lRkm(|t-s|)|  < 0, 

for  k,n=l,...,K;  t,s=l,2,... 


t-s 


2)  |cov(x„  . , x x . ) | <_  Cn 1 * , where  M=max{  |t-s  | , 1 1 -w I } 
r,K  s,m  w,i 

for  k,m,i=l, . . . ,K;  t ,s,w=.l,2 , . . . 


i i M 

3)  cov(x  ,x  . ,x  x . ) | £ Cn  , where 
1 t,k  v,]  s,m  w,i 

M=max( { t— s ( , |t-w  | , |v-s| , |v-w| },  for  k,m,i, j=l, . . . ,K; 


for  t,s,w,v=l,2, . . 


where  C is  a positive  constant  and  0 <,  n < 1.  Then  the  MLE  of 

B,o  are  consistent,  asymptotically  normally  distributed,  and  efficient 

in  the  MP  sense. 

Proof : 

Denote  the  moments  of  the  process  (x  } as 

t . K 


E(xt,k)  = 'Jk 


C0V^Xt  h’  Xs  k'  " Rhk(|t-s|)  f°r  h>k=1>*-*>K>  for  t,s=l,2, 


Sinca  the  pi'ccess  {x^}  is  independent  of  the  error  terms,  and  its 
density  is  not  a function  of  the  parameters  to  be  estimated,  the 
second  derivatives  of  the  log  likelihood  function  are  identical  with 
2.5  through  2.10.  As  in  the  proof  of  theorem  2.1,  can  be  expressed 


y = t \(t)(j>, 
X k=0  * Y 


where 


^k  l 6G+aXk,a 


(t)  = l X (j 
t m m 

m=l 


t-k-f-1 


for  k=0,...,t;  t=l,...,n 


Analyzing  term  2.5  first,  wc  calculate,  that  for  t > i > j, 


E(yt-iyt-j)  = Vt-i)  -o(t-j) 

K t-j  t-i 

+ ( l eG+aV!-)  t4>o(t"i)  E + *0(t-j)  ^ Vt_i)} 

a=l  1 ' m=l  ‘ k=l 


+ ° l *k(t-i)  *.(t-j) 
k=0 

+ Z J/G+a  BG+b  {Jt  ^k(t-iHm(t-3)CRab(lk'm|)  + Vb} 

a=l  b=l  k=l  m=l 


Summing  2.26  over  t from  i+1  to  n,  and  dividing  by  n,  yields  the 
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term  whose  convergence  we  must  show.  Clearly  the  first  two  terms  go  to 

2 

zero  as  n -*■  «*>.  uniformly  for  6,0  in  a compact  set.  The  average  over 

t from  i a 1 to  n of  the  third  term  of  2.26  is  identical  to  the 

second  term  of  2.17.  Lemmas  1.3  and  1.4  imply  that  the  fourth  term 

2 

converges  as  n -*•  <*>  uniformly  for  J3,o  in  a compact  set. 

To  show  that  the  variance  of  2.5  converges  to  0 uniformly  in 
6_, we  have  only  to  check  the  convergence  of  2.20,  under  the  assump- 
tion that  the  sequence  of  exogenous  variables  is  random.  V/ith 
defined  as  in  the  nroof  of  theorem  2.1, 


I\ 

l S . x 
G+a  p,; 


the  covariance  function  required  may  be  expressed  as 


2.27  cov(d>  <t>  A ) = cov(in  m ,m  m ) + cov(e  e ,e  e ) 

pqrv  pq  rv  pqrv 


+ E(m  m ) E(e  c ) + E(m  m ) E(e  e ) 
pr  a v pv  qr 


+ E(m  m ) E(f.  e ) + E(m  m ) E(c  e ). 
qr  pv  qv  pr 


Convergence  for  each  of  the  terms  of  2.27  will  be  verified.  The  first 
te’mn  can  be  written  as 


K K X K 

cov  (m  m ,ra  m ) = l J l I ^ 6(.+b  BG+C  BG+d 

r n a=l  b=l  c=l  d=l 


cov  ( x x , , x x , ) 
p,a  q,b  r,c  v ,d 


whose  absolute  value  is,  by  assumption  3,  bounded  above  by 
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a i3Gj>t>-r| 

a=l 

since  max{ | p-r | , | p-v | , | q-r | , | q-v | }> | p-r | , and  0 < n < 1.  So,  for  the 
first  term  of  2.27,  we  have 


|(rTo4)  1 l l l l l l Mt-i)  * (t-j)  4>  (s-i)  $ (s-j)  * 
t=i+l  s=i+l  p=l  q=l  r=l  v=l  p 4 

cov(m  m , m m ) 
p q’  r v 


n t-i  t-j  s-i  s-j 

r'  r> J r»  r» J 


-C(  l leG+a!)4(  l I V l )Up2(l-p)~2Cn2a4)-1^  2C1~tt2) 


2,-1 


a=l 


n n I 

{ l l ( [ t-S  | 7T  ' 

t=i+l  s=i+l 


t-s  /•-  2.,,  2.-1  t-s  \\ 

1 + ( 3-n  )(l-n  ) 7T  )} 


where  u = n,ax{  | p | , ] p„  | , . . .,  |pG|  ,n  } . By  lemmas  1.1,  1.2  this  upper 

2 

bound  converges  to  zero  uniformly  for  B,o  m a compact  set. 

For  the  second  term  of  2.27,  we  note  that 


cov( e e , e e ) 
p q r v 


2a 


4 


a 


if  p=q=r=v 

if  p=r,q=v,p/q  or  p=v,q-r,p^q 


= 0 otherwise 


Therefore,  we  have  that,  if  i > j 


t=i+l 


“ L * m1 
m=l 


n t-i  t-j  s-i  s-j 

l l l l l 4><t-i)  ♦ (t-j)  ♦ (s-i)  * (s-j)  * 

s=i+l  p=l  q=l  r=l  v=l  F 1 

cov(ee,ee) 
p q r v 


t=l  s=l 
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% 2 
which,  by  lemma  1.1,  goes  to  zero  uniformly  for  fs,a  in  a compact  set. 

Thus  the  mean  and  variance  of  2.5  converge  as  required. 

The  expected  value  of  term  2.6  is 


-2 

o 


(R. .(0)  + p.u. ) 

13  i 3 


which  is  not  a function  of  n.  Also,  the  variance  of  2.6  is 


t=l 


n 

'l  cov(:< 
s=l 


t 


x 

5,1 


X^ 


.) 

>3 


which  is  bounded  above  in  absolute  value  by 


t-s 


which  goes  to  zero  as  n ■>  »,  uniformly  for  8,0  in  a compact  set,  by 
lemma  1.1. 

The  mean  and  variance  of  2.7  converge  as  indicated  in  the  proof 
of  theorem  2.1. 

The  expectation  of  2.9  is 


2.20 


(no" ) 1 l u.y  . + (nc/)  1 l $ (t-i)u. 
t=l  3 t=i+l  3 


2,-1 


+ I 6G+a  (na2)"\  ! j C'k+1[^a  + Rja(,t-k|)] 

m=l  a=l  t = i+l  k=l  J J 


. 2 
The  first  and  second  terms  of  2.28  go  to  zero  uniformly  for  S,o  in 

a compact  set.  Lemma  1.3  implies  the  third  term  of  2.29  converges  as 
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n -♦«  <*.  The  continuity  of  X and  p as  functions  of  3 implies 

-'m  m — 

uniform  convergence  of  this  t,  an. 

It  remains  to  show  that  the  variance  of  2.C  goes  to  zero  uniformly 
2 

in  . We  can  write  the  variance  as 


2.29 


(n'o*)  1 l J y . . y . cov(x  , x . ) 

t"1  t-i  ;s-i  t,:  s,] 

2 4—1  ^ ^ 

+ 2(n  a )"-  l l y cov(x  x . y ,) 
t=l  s=i+l  ,J 

+ (n2o4)-1  l l cov( x y xo  y ) 
1 , L •>  J Lx  “ » J 


1 1 


t=i+l  s=i+l 


The  first  term  of  2.29  clearly  goes  to  zero  as  required.  The  second 
term  goes  to  zero  uniformly  if 


cov(x  . , X .V  .) 
f,3  s,3  s-i 


K 

(s-i)  y B 


a=l 


G+a 


cov(x^  . 

t,D 


x 

s 


. X 


) 

,a 


is  bounded  uniformly  in  s by  a continuous  function  cf  (5,0  . By 
assumption  2,  this  is  true.  For  the  thii'd  term  of  2.29,  we  can  write 


cov(x  .y.  .,  x .y  . ] 

t , j t - 1 S,jyS-l 


G G K K 

2.3o  y y y y x x,  e„  b„,. 

S 3.  4 4c  d MQ+a  ^G+b 

c=l  d-1  a=l  b-1 


t-i  s- 


k=0  m=0 


2 

By  assumption  1,  |E(x  .x^  . )j  < C + p.,  so  that 

y J ? * 3 J 


t-i  s-i 

II  Ip; 

k=0  m=0 


6,  o E(x  .x  . ) I 
km  t ,3  s ,3  1 


2,„  2.  2 , 2.-1  t-s 

<.  a ('-ty..  )p  (1-p  ) p 1 1 


Assumption  3 implies  that 


t-i  s-i 

I I I 

k.-O  n=0 


t-i-ktl  s-i-mt] 


cov( X .X,  , X . X , ) 

t , 3 k,a  s , 3 m,b 


2 2,0  2,  I 

U „l  * tt  (3-tt  ) | 
|t-s|  — . V 

1-n  Cl— tr  ) i 


Lernr. as  1.1,  1.2  may  be  invoked  to  show  the  convergence  of  this  term.. 
Therefore  the  convergence  of  the  third  term  of  2.29  to  zero  uniformly 

9 

in  B,o“  is  guarantee'!. 

The  ex  ted  v 1 . jf  2.9  is  zero,  since  y . is  a linear 

L — u. 

ail  of  which  are  independent  of  e . 

The  Mrlar.cs  of  2.2  is 


n t-i 


. * (»'.»•*  T T »k  (t-i) 

t=i+l  k=l 


p. 

r*  2 

K't-i) 


. . o 

t = i+i 


n t-i  t-i 


+ (a'  ’)  ‘ L l L Mt-i)  ♦„ ( t-i ) E(m,  m,) 


t=i+l  k=l  fc=l 


1>  it  !y  the  first  and  third  terms  of  2.32  go  to  zero  uniformly  in  B,o  , 

t-i  j 2 

hince  } i*1  (t-i)  is  hounded  above  by  a continuous  function  cf  p,n  , 

k=l  k 


w-  i h i not  a function  of  t. 


_ n t-i 


r,-'  l rV(t-i) 


t=i+l  k=l 


2 

converges  to  zero  uniformly  for  in  a compact  set.  Since 


|E(m.  n ) | < C(  1 3 )2  + ( £ fi„+  U Y 


, G+a  G+a  a 

a=l  a=l 


which  is  not  a function  of  k or  Z,  we  have  that  the  last  term  of 
2.32  is  bounded  in  absolute  value  by 


(n°6 )_1(  1 |X  |): 


p"(l-p) 


-2 


c-1 


[C(  \ W" 

a=l 


K 

+ C [ 6 

a=l 


G-;-a  "a’"1 


which  converges  to  zero  as  n •+  uniformly  for  p,a^  in  a compact  set. 
Therefore  the  variance  of  2.9  converges  as  required. 

The  expected  value  of  2.10  is  zero,  and  the  variance  is,  by 
assumption  1,  bounded  in  absolute  value  by 


(no6)-1  (; 


+ v y 

3 


which  converges  to  zero  as  required. 

Since  the  conditions  fer  theorems  1.1  and  1.2  hold  for  2.5  through 

2 

2.10,  we  apply  theorem  1.3  to  chow  the  MLE  of  S_,a  have  the  desired 
properties.  Theorem  2,2  is  proved. 

2.4  Optimality  Results  for  the  Multivariate  Linear  Model  with  Lagged 
Dependent  Variables  and  Nonrandom  Exogenous  Variables 


The  r?sults  pertaining  to  optimal  properties  for  the  linear  model 


■ *.n  lagged  dc-pendcnt  variables  tr.ay  be  extended  to  the  situation  where 
the  variables  are  multivariate.  The  appropriate  model  is 


VJ 

y ~ 7 B y^  + T B„  . x , + e . t=l,2,, 

**  ?=i  S^-g  G+n  -t,h  -t 


where  B,,...,B0  are  the  I.  b"  L matrices  of  unknown  parameters  and 
B^+^,  for  h = 1,...,  are  the  L by  matrices  of  unknown  para- 

meters. The  endogenous  variable  y is  Lxl  dimensioned.  The 
exogenous  variable  x.t  ^ is  M^.".l  dimensioned,  for  h=l , . . . ,K*,  and  the 
erroi’  terms  are  Lxl  dimensioned.  To  simplify  the  notation,  define 

T = [B,  ,...,3  ,.]  as  the  LxK  dimensioned  matrix  of  unknown  coef- 

L.  fl  b+iv* 

T T T T 

ficients  of  r.  = [x  ,x  ,...,x  which  is  Kxl  dimensioned, 

'~t  - t , 1 — t , / — t , K.- 

K* 

where  K = y M,  . The  model  2.33  becomes 
L‘  h 
h=l 


it = £ Bg  yt-K  + r^t + e-t  t=i»2. 


g=i g ^-g 


As  in  the  univariate  case,  the  stability  condition  will  be  assumed; 
i.e.,  all  LG  roots  of  the  determinental  equation 


|pG  - pG_1B  . . . . -bg|  = o 


are  less  than  one  in  ai.  solute  value.  Several  theorems  are  presented 
below  for  different  distributional  assumptions  on  £ and 

For  notational  simplicity,  the  elements  in  row  i,  column  j of  a 
matrix  A will  be  denoted  by  (A)^.  We  will  also  denote  by  (v)^  the 
i’th  component  of  the  vector  £.  The  notation  (M)  : , (M)^  will  denote 


■2 


respectively  the  k'th  column  of  M and  the  k'th  row  of  M.  The  symbol 

ii  ”1 

a will  denote  the  element  in  row  i,  column  j of  Z 


Theorem  2 . 3 

Tor  the  model  2.34,  assume  that  the  exogenous  variables  z_  are 
nonrandom  and  the  error  terms  are  independent,  with  a joint  normal 
distribution  with  mean  vector  0 and  nonsingular  convariar.ce  matrix  Z . 
Assume  also  that 


1)  |(£t)hl  ~Z  for  for  t=l, 2 , . . . 

— t n "C  ^ 

2)  lim  n~  T z z exists,  and  the  convergence  is 

— t— t+c 

n-*»  t = l 

uniform  in  c,  c=l,2,... 


Then  the  MLE  of  B, , . . . ,BG,r ,Z  are  consistent,  asymptotically  normally 
distributed,  and  efficient  in  the  M?  sense. 

Proof : 

Consider  the  values  of  y,  _,..., yn  as  constants.  The  conditional 

— 1-b 

density  of  v, , given  y , . . . ,y  is  multivariate  normal  with  mean 
j — tr  - 1 — 't-G 


J B 

g=i  s -c-? 


and  with  covariance  matrix  I.  The  joint  density  of  y^,...,^  is 
therefore  the  product  of  the  conditional  densities  at  time  t given 
the  values  the  G previous  terms.  The  following  second  partial 
derivatives  of  tie  log  likelihood  function  are  required. 
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As  is  theorem  2.1,  y 


can  be  expressed  as 


2.46 


yt  = i^t)  + $1(t)^1  + ...  + t=i,... 


where  ^ = e_:  + Tz^,  $^(t)  is  an  Lxl  vector  of  constants,  and  £^( t), 
for  j=l,...,t,  is  an  LxL  matrix  of  constants.  Generalizing  from  the 
proof  of  theorem  2.1,  the  system  of  equations  which  determine  <S>  is 


2.47  3>.(t)  = B,  4>.  (t-1)  + 

1 1] 


+ BG$..(t-G)  for  j = 0,...,t 


T!  a initial  conditions  are 


2.48  ( j )=I,  e.(j-l)  = 0, $.(j-G+l)=0  j=l, . . . ,n 

V0)‘V  V-1^! ‘o'  -r:,1)=2_c„l 

The  matrices  I and  0 are  LxL.  The  solution  to  the  equations  2.47  with 
initial  conditions  2.43  is 

2.49  *Q(t)  = Q (t)o*  + + 

<Mt)  = P . (t)p.'’  + + P.4(t)p^  j=l,2,...,t 

] L X.j  X. 


where  Pq»p2’ ’ ’ * ’p£  are  t^le  distinct  solutions  of  2.35.  Q (t),..., 
Oj.Ct)  are  Lxl  vectors  whose  elements  are  polynomial  functions  of  t 
of  degree  equal  to  the  multiplicity  of  the  corresponding  root,  less 
one.  P..(t),  for  i=l,....#  for  j=l,...,t  are  defined  similarly. 
Assuming,  without  loss  cf  generality,  that  all  LG  roots  of  2.35  are 


distinct,  the  vectors  Q and  the  matrices  P have  degree  zero  as 
polynomial  functions  of  t.  The  solutions  car.  therefore  be  written  as 


Vr)  = J. 

k=l 


$ (t)  = y o 

0 A -k  k 

K--x 


The  expectations  and  variances  of  2.36  through  2.45  can  now  be  computed 
and  shown  to  satisfy  the  assumptions  of  theorems  1.1  and  1.2. 

For  2.36,  if  h _>  g 


:(n  1 l (y  ).(y  . ) ) = 

t31  -t-g  ] -t-h  m 


n‘1j1(Zt-g)j(^t-h)m  + n’1t  = L<Xt-S>5E(2£t-h)» 


n n t-h 

+ n l (4>0(t-g))  .(*0(t-h))  + n £ (4>  (t-g)).[  J 4«(t-h)E(<>.  )] 

t-'n+l  J ‘ t=h+l  J k=l 

t-i.rl  k=l 


-1  r tl'tl  T 

l l Z <\( ti)E(^)  4m(t-h)),n 


t=h+l  k=l  m=l 


The  first  two  terms  of  2.50  clearly  go  to  zero  as  n -*•  °°,  uniformly  for 


B , . . . ,B  ,P  in  a compact  set.  The  third  term  is 


LG  LG 


I l <Vj  %>„  I ° 

1=1  b=l  J t=h+l 


-1  r t-g+1  t-h+1, 

JLPa  Ph  > 


which  converges  to  zero  uniformly  in  B since  p is  a 

1 G cl 


1 


continuous  function  of  these  parameters,  and  0 < |p^|  < 1,  for 
a=l,...,G.  The  fourth  term  of  2.50  is 


LG 

l 


a=l 


LG 

1 (Q 

b=l 


l 


t=h+l 


:-h 


t-g+1 


k=l 


t-h-k+1 


P 


b 


where  R denotes  the  m'th  row  of  P,  T.  Since  the  elements  of  z, 
bm  b -k 

are  bounded  above  in  absolute  value,  and  0 < Ip  j <1,  with  p^  a 

continuous  function  of  3 the  above  term  goes  to  zero  as 

J-  \J 

n -*■  uniformly  for  B,  , . . . , r,Z  in  a compact  set.  The  same  result 

holds  for  the  fifth  term.  The  sixth  term  is 


2.51  l 1 R (»-»  ? tjh  t-h-ml  T,,  R. 

a=l  b=l  ] 


t=h+l  k=l 


b 

m=i 


z,  z ) ) iv 

— k— m bm 


LG  LG 


+ l l ^ViT  ^ l l P 

a=l  L-l  a J'“  t-h+1  k=l 


-1  r trh  t-g-k+1  t-h-k+1, 

P > 

a d 


Assumption  2 implies  the  first  term  of  2.51  converges,  by  the  same 
argument  used  for  the  first  term  of  2.17.  The  second  term  of  2.51 
converges  to 


LG  LG 


a=l  b-1 


(paEFf’Jm  pa"S 


as  n -*■  »,  uniformly  in  the  unknown  parameters.  Therefore  we  have 
shown  that  the  expected  value  of  2.36  converges  as  n ->  <*>,  uniformly 
in  the  unknown  parameters. 

To  verify  that  the  variance  of  2.36  goes  to  zero,  we  write  it  as 


2.52 


"2<»kl)2t4»i  sJucov<<zt-g)r  'Wi1  W.W, 


n n 


+ 2(ak' )“  n 2 V 7 (v  , ) ccv( (y  ).,  (y  ).  (y,  , 

t=g+l  s=h+l  -*-h  m J ^“h 


) ) 
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, ki  2 -2  r ? 

+ (a  ) n 2 


s = ntl 


J cov((y  ).  (y  , ) , (y  ).  (v  , ) ) 
t_£+1  -t-g  j -t-h  m’  ^s-g  ] -s-h  m 


The  first  two  terms  of  2.52  go  to  zero  as  n °°,  uniformly  for  the 
unknown  parameters  in  a compact  set,  since  each  term  is  hounded  above 
in  absolute  value  by  n ^ times  a continuous  function  of  the  parameters. 
The  double  summation  in  .he  last  term  of  2.52  is  analyzed  in  the  came 
way  as  the  corresponding  univariate  term,  the  last  term  of  2.19. 

Therefore  we  conclude  that  2.52  goes  to  zero  uniformly  for  the  parameters 
in  a compact  set.  This  ends  the  proof  that  term  2.36  satisfies  the 
assumptions  of  theorems  1.1  and  1.2. 

Assumption  2 implies  that  term  2.37  converges  as  n ~y 
Terms  2.38,  2.39,  2.40  will  be  shown  in  section  3.2  to  satisfy 
the  conditions  of  theorems  1.1  and  1.2. 

The  eupccted  value  of  2.41  is  the  sum  of  a o(l ) term  and 
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T rm  2.53  is  the  multivariate  analog  of  2.23.  The  first  term  of  2.53 
converges  to  zero  as  n -*■  °°,  since  ( ) . is  bounded  by  assumption  1, 
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and  0 < |p  | <1.  Assumption  2 implies  the  second  term  of  2.53 
converges,  applying  a similar  argument  to  that  of  the  first  term  of 
2 23.  Again,  the  convergence  is  uniform  for  the  parameters  in  a 
compact  set . 

The  variance  of  2.41  is 


n (okV  l f (z  ).  (z  ).  cov(  (y  ) , (y  ) ). 
t=h+l  s=h+l  “r  ^ “s  ] **~h  :n  -s"h  m 


For  s > t,  we  have  that  cov((y  . ) , (y  , ) ) = 

— t-h  m ^s-h  m 


LG  LG  T 

I l (P  IP*)  t(p  p,  - (p  p,  )t  ‘ ' )(l-p  p,  ) 

b~l  ^ ^ k b.  b 3 b nb 


Since  the  components  of  z^_,  for  all  t,  are  bounded  above  in  absolute 
value,  and  P^ , j=l, . . . ,LG  are  less  than  one  in  absolute  value,  the 
variance  of  2.41  goes  to  0 as  n -*■  The  convergence  is  uniform  in 

the  parameters  since  P ,P  are  continuous  functions  of  the  parameters. 

To  determine  the  exoected  value  of  2.42,  we  note  that  y is  a 

-t-g 

linear  comihination  of  error  terms  with  indices  less  than  or  equal  to 
t-g,  and  therefore  1 i pendent  of  e_  . Therefore  the  expected  value 
of  this  term  is  zero. 

The  variance  of  the  first  term  of  2.42  may  be  bounded  above  by  a 
continuous  function  of  the  unknown  parameters  which  goes  to  zero  as 
ri  ",  uniformly  in  the  parameters.  The  argument  for  2.43  is  similar. 

The  expected  values  of  2.44  and  2.4b  are  zero.  The  variances  of 
both  terms  clear Lv  go  to  zero,  since  the  error  terms  are  independent, 
and  the  components  of  the  exogenous  variables  are  bounded  above  in 


absolute  value . 


Since  the  terms  2.36  through  2.45  satisfy  the  assumptions  of 
theorems  1.1  and  1.2,  we  nicy  apply  theorem  1.3  to  conclude  that  the 
hypotheses  of  Weiss's  theorem  are  satisfied  and,  therefore  the  MLE  of 
Bf, . . . ,Bq ,r,E  have  the  desired  properties. 

2.5  Optimality  Res"lrs  for  the  Multivariate  Linear  Model  with  Lagged 
Dependent  Variables  and  Random  Exogenous  Variables 

Theorem  2.2  may  also  be  extended  to  the  multivariate  model.  The 
distributional  assumptions  for  the  components  of  the  multivariate 
exogenous  variables  are  the  analogs  of  the  distributional  assumptions 
of  the  univariate  theorem. 

Theorem  2.4 

For  the  model  2.34,  when  the  suability  condition  is  satisfied, 
assume  that  the  exogenous  variables  {z^_,  t=l,2,...}  are  identically 
distributed,  independently  of  the  error  sequence  {c_  , t=l,2,...}, 
which  are  independently  distributed,  multivariate  normal  with  mean 
vector  0 and  covariance  matrix  Z . Assume  also  that  assumptions 
1,2,3  of  theorem  2.2  hold  when  x is  replaced  by  z.  Then  the  MLE 
of  B1,...,BG,T,  and  Z are  consistent,  asymptotically  normally 
distributed,  and  efficient  in  the  MP  sense. 

Proof : 

Since  the  process  z is  independent  of  the  error  terms,  and  the 
parameters  of  its  joint  distribution  are  not  functions  of  the  parameter 
to  be  estimated,  the  second  derivatives  of  the  log  likelihood  function 


with  respect  to  the  parameters  are  the  same  as  2.36  through  2.45.  The 
arguments  that  these  terms  satisfy  the  assumptions  of  theorems  1.1  and 
1.2  follow  essentially  the  arguments  of  theorem  2.2  for  the  univariate 
model.  The  constants  A^,  m=l,...,G,  are  replaced  in  this  theorem  by 
the  elements  of  the  constant  vectors  Q =1,...,LG  or  the  elements  of 

— d 

the  constant  matrices  P ,u=l,...,LG.  The  same  arguments  are  used  to 

a ’ 5 ’ ° 

show  convergence  and  to  bound  the  variances. 

2.6  Linear  Models  with  Autoregressive  Disturbances 

In  theorems  2.x  through  2.4,  it  was  assumed  that  the  error  terms 
were  serially  uncorr  xlated . The  optimality  properties  also  hold, 
under  additional  assumptions,  when  the  errors  satisfy  a finite  order 
autoregressive  process.  It  is  i eces-  cry  to  invoke  theorem  1.4  in 
order  to  obtain  the  results.  This  approach  is  due  to  Nocturne  [1970]. 

We  will  consider  the  multivariate  autoregressive  process  2.33, 
where  the  errors  satisfy 


-G-t  = Ri-Vi  + 


+ RH£t-H  + 


where  {u^ ,t=l, . . . } are  distributed  independently  and  identically, 

multivariate  normal  with  mean  0 and  covariance  matrix  I. 

Subtracting  R,y^  , + R0y.  „+...+  R.,v  ..  from  2.33  yields 
1— 1-1  2*-t-2  h-t-H  J 
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where,  assuming  G ^ H, 


90  = r 


61  = -Rir 


9H  = -RHr 
A = R + B., 

1 1 X 


A2  = R2  + B2  - R1B1 


A3  = R3  + P3  ' R1B1  - Vi 


ah+i  = bh+i  " ribh 


-rhbi 


\ = -R  B - R B 

Gh-1  1 G 2cG-1 


rhbg-h+i 


A 


G+!i 


R..B 
n u 


We  can  apply  theorems  2.3  or  2.4  to  2.54  to  obtain  optimal  prope^tier- 
for  the  TLE  of  that  model.  The  parameter  map  of  theorem  1.4  it 
displayed  above.  It  is  clear  that  this  function  has  the  smoothness 
properties  required.  The  map  is  1-1  if  there  exist  unique  solutions 
R^,...,P..j  to  the  first  H equations,  when  0.,j=l,...,H  are  given. 
The  equations  for  A,  through  Ar,  define  The  parameters 

A^^  through  ire  mapped  to  themselves.  It  should  be  noted  that 

this  metl  1 v.  . i to  any  linear  mode  1 wit!  itoregres:  ive 


disturbances. 


CHAPTER  III:  NONLINEAR  MODELS 


3.1  Results  from  the  Literature 

Recent  contributions  have  been  made  to  the  asymptotic  theory  or 
estimators  for  nonlinear1  models,  and  to  computational  methods  for  these 
estimators.  Malinvaud  [19661  deals  with  the  nonlinear  model 


3*1  y t = + £t 


t=l , 2 , . . . 


where  denotes  the  Lxl  dimensioned  endogenous  variable,  denotes 

the  Kxl  dimensioned  exogenous  variable,  a denotes  the  pxl  vector 

of  unknov.'n  parameters,  and  denotes  the  Lxl  dimensioned  error 

variable.  Malinvaud  chooses  the  estimator  a (S)  which  minimizes 

— n 


where  S 
required 


t 

t (y,  - g(x.  ;ai ) S(y  - g(x 
t=l  X 

is  a positive  definite  matrix.  The 
to  guarantee  consistency  and  asymptot 


t •,  a ) ) 

fo] 1 swing  conditions 
ic  normality  for  this 


are 


estimator. 


Ml)  The  error  variables  {£ } are  independently,  identically 
distributed  with  moan  vector  £ a:  1 nonsingular 

covariance  matrix  £. 

f.  3 


r_  4 


M2)  Let  W be  any  closed  set  in  Euclidean  p-space  such  that 
the  true  value  a°  is  in  the  complement  of  W.  Let 

^ 0 T 0 

Q ( S , ci)  = l (£(x  ;ot)-g(x.;a  ))  S(£(x  ;ct^)-g(x  ;a  )) 
t=l  r 

As  n -*■  ■», 

n 

sup  (Q^  (S,ci) ) 1l  l (e_j. ).  (g(xt;a)  - g(xt ; a° ) ).  | 
c in  H ‘ r=l 

must  converge  stochastically  to  zero,  for  i,h=l,...,L. 

s 0 
M3)  An  open  neighborhood  V of  the  true  parameter  value  a 

is  contained  in  the  set  A of  admissible  values  of  a_. 

M4)  In  V,  the  functions  g and  their  derivatives  of  the 

first  three  orders  with  r'espect  to  the  unknown  parameter 

are  uniformly  hounded  in  st  and  a.  for  any  positive 

definite  symmetric  matrix  S,  the  matrix 


-1 
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n 


l 

t=l 


T 

zt  S 


z 


t 


where  (Z  ).  = 3(g(x  ;a)./8a,  j 0,  is  ncnsingular  and 

u iK  — l 1 K Ot 

tends  to  a nonsingular  matrix  as  n -*■  00 . 


Under  assumptions  Ml  through  M4,  a (S)  is  a consistent  estimator  for 
1/2  0 

u and  n (an(E)  - a ) is  asymptotically  normally  distributed. 
Furthermore,  if  c is  normally  distributed,  the  estimator 
a ^ ( M ) , where 

-1  r T 

Mce  = n l (yfc  - £(xt;an(S)))(jrt  - fi(xt,a  (S))) 
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attains  the  Cramer-Rao  lower  bound  on  the  variance.  Under  the  same 
conditions,  Barnett  [1974]  shows  the  MLE  are  consistent  and  asymp- 
totically normally  distributed,  with  the  covariance  matrix  of  the 
limiting  distribution  equal  to  the  inverse  of  the  information  matrix. 

Several  authors  have  proven  strong  consistency  for  least  squares 
estimators  of  parameters  of  nonlinear  models.  For  the  univariate 
model  Jenrich  [1969]  uses  3.1,  where  L=l.  The  exogenous  variables 
may  be  random  or  nonrandom.  If  they  are  assumed  to  be  random,  the 
limits  in  conditions  J1,J2,  and  J3  below  must  exist  almost  surely,  and 
the  exogenous  variables  are  independent  of  the  error  variables. 

n 2 

Jl)  lim  l (g(x  ;a) ) must  exist.  Denote  the  limit  as 
n->™  t=l 

g(a).  Furthermore,  | g(a)-g(an) | ^ has  a unique 
minimum  over  a.  in  the  compact  set  A at  the  true 
parameter  value  a°. 

Assumption  Jl  implies  the  least  squares  estimator  of  a°  and 

the  estimator 

~o  U ~ 2 

Jn  = n (yt  ' K(-t;  --n) } 

"t  “X 

2 

of  a converge  almost  surely  1:o  their  true  values. 

J2)  All  first  and  second  derivatives  of  g(xt;a)  with 

respect  to  elements  of  a must  exist  and  be  continuous 
on  A.  All  limits  of  the  form 

-1  v 

n \ hx(xt -,0.)  h2(xt;a) 


exist,  where  h^,h2  are  an^  ?•  : 3g/9a^, 

IT  2 

g — = 9 g/9eu9ou,  which  must  exist. 

J3)  The  true  parameter  value  ct^  must  be  an  interior 
point  of  A,  and  the  matrix  whose  i,j'th  element  is 

-1  n 

limn  l g!(x  ;o)  g!(x ;a)  | Q 
n-H»  t=l  a 

is  nonsingular. 

Under  conditions  Jl,  J2,  J3,  the  least  squares  estimator  of  a is 

asymptotically  normally  listributed  with  mean  0.  The  inverse  of  the 

asymptotic  covariance  matrix  is  given  in  condition  J3.  Furthermore,  if 

normality  of  the  error  terms  is  assumed,  then  the  least  squares 

estimator  is  efficient  in  the  Rao  sense.  A consistent  estimator  T 

— n 

of  a is  said  to  be  efficient  in  the  Rao  sense  if 


n1A|T  - a°  - Bz  | 
n — — n1 


converges  stochastically  to  zero,  where  B is  a matrix  of  constants 
which  may  depend  on  a,  and  is  a vector  whose  i'th  component  is 


n 1 9 log  L («,a2)/3a. 

n — i 


Hannan  [1971]  extends  Jennrich’s  result  to  the  case  where  is 

generated  by  a stationary  time  series.  Specifically,  Hannan  assumes 


e is  of  the  form 


t = l v 


l -Y.,)2  < 

*1  r — oo  ** 


vrhere 


t'7 


The  variables  {nt)  are  assumed  to  be  independently,  identically 
distributed  with  mean  0 and  variance  I.  The  following  conditions  are 
required. 

HI)  The  spectrum  of  e which  is 

(X> 

f(X)  = C 2tt  )_1 1 l y . exp(ijX)|  ' 
j =-00 

is  continuous  in  X.  The  set  A of  possible  values  of 
a is  a compact  set. 

H2 ) The  exogenous  variables  {x^}  are  independent  of  {e^}. 

— It'  1.  2 x 2 

lira  n l g(x  ;a  ) g(xt+]<;a  ) = g(k,a  ,a  ) 

n-*x>  t=l 


exists  almost  surely  for  k=0,+l,...  . The  convergence 
is  uniform  in  and  a" . 

H3 ) g(x;w)  is  twice  differentiable  in  cm  Uniformly  in 

a, a1,  and  a2,  the  following  limits  exist  almost  surely, 
for  and  for  c > 0: 


lim 

n-H» 
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l g * ( x , a ) 
t=l  J ~ 


Sk(2W^ 


Under  HI, 
squares  o 


n 

— j.  c 


lim  n L 
n-*»  t=l 


gik(-t’— 1 } gjk(-t+c;-2) 


H4)  g(0,a,a)  + g(0,a°,«°)  - 2g(0,a,a  ) > 0 

H2 , and  H4 , if  a°  is  an  interior  point  of 
, 0 

Estimator  a of  a 

— n — 


for  a i a . 

A,  then  the  least 


is  consistent  and  asymptotically  normally 


distributed. 
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Robinson  [1972]  extends  Hannan's  results  to  the  multivariate  model 


-Zt  = B0  §-(— t ) + it 


where  e = 7 r.u  . , 

— t . L ]— t-]’ 
3 = -“ 


l t |r3 ! i2  < - 


where  (T ^ } are  LxL  matrices  with  mean  vector  £ and  LxL  identity 


covariance  matrix. 


1/2 

\ ! | is  defined  as  [tr(  T . TV-'! ) ] , where 


?_.*  is  the  adjoint  of  F . . The  parameters  a must  satisfy  s nonlinea’ 
equations 


6 /a)  = 0 

6o(a)  = 0 

which  have  continuous  first  derivatives.  Under  conditions  which  are  the 
multivariate  extensions  of  Hannan's  conditions,  the  estimators  B and 
u which  minimize  the  residual  sum  of  squares  subject  to  the  s non- 
linear equations,  are  consistent  and  asymptotically  normally  distributed. 

3.2  Optimality  Results  for  Nonrandom  Exogenous  Variables  and 
Independent  Error  Terms 

We  shall  consider  the  multivariate  model  3.1  under  several 
different  assumptions.  In  this  section,  the  simplest  assumptions  arc 
made.  The  exogenous  variables  are  assumed  to  be  nonrandom  and  the 
error  terms  are  assumed  to  he  independently  distributed,  for  the 
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vector  valued  function  g we  use  the  notation 


gl(xt;a)  = 9g(xt;a)/3ai 


g[t(xt;a)  = 0 S.(xt;«)/8cu  8a' 


Theorem  3 . 1 


For  model  3.1,  assume  the  exogenous  variables  {x^,t=l,...} 
are  nonrandom  and  the  error  variables  {e  ,t=l,2, . . . } are  indepen- 
dently, identically  distributed,  multivariate  normal  with  mean  vector 
0_  avid  unknown,  positive  definite  ccvvriance  matrix  E.  Assume  also 


-1  ■ T -1 

.)  n \ £.•  (x  >w)  converges  as  n -*■  °°, 

t=l  1 T ^ t 

uniformly  for  a in  any  corpact  set. 


2)  is  uniformlv  bounded  in  (x^)  and  a,  for 
a in  any  compact  set,  for  i=l,...,p. 

3)  Ig'.Kx  oi ) ] is  uniformly  bounded  in  vx  } and  a,  for 

-ij  — ■ t — J — t — 

a in  any  compact  set,  for  i,j-l,...,p. 


Then  the  MLE  of  a and  E are  consistent,  asymptotically  normally 
distributed,  and  efficient  in  th • HP  sense. 

Proof : 

To  apply  theorems  1.1,  1.2,  and  1.3,  we  calculate  the  second 
partial  derivatives  of  the  logarithm  of  the  joint  density  of 


yn),  Ln(a,E). 


-i  3 log  L (cx,E) 

3o . . 3c,  , 

i]  hk 


, hi  jk  hj  ik. 
(o  cr  + a a ) 


+ o 


jk  -1 

n 

n rp 

l £t(z' 

t=l 

ik  -1 
n 

n rp 

l 4 a 

t=l 

jh  -1 
' n 

T1  rp 

y J(l' 

t=i 

ill  “1 
r n 

n _ 

l 4<r- 

t=l  1 

• i * 

j,  h i k; 

. 3 log  L (a,I) 

3 a . . 3c.  , 
lj  hh 


hi  hi 
= -a  a J 


jh  -1  r T . -1 . T -1. 
+ a n > cjl  ) , c (L  ) 

t=l  <h  _t 


+ A'1  l eld-1)  , ejd-1) 

t=l  * 


for  i / j;  i , j . h = 1,...,L. 


-1 


3^1og  L (a,E) 
& n — 

3c . . 3a,  , 
ii  nh 


. , . hi  hi 

-(1/2)0  a 


hi  -1  v i,_-i,  i,_-i> 

on  l c_td  ) h Cf  U ) 

t=l 


.T^-l,  _T,_-1, 


for  i,h  = 1, . . . , 
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, 7 " log  L (c.E) 
-1  ° n — 

9a,  9a . . 
k 13 


-1  I ^ >.i 


for  i t j,  k = 1, . . . ,p;  i,j  - lv>^ 


for  k = l,...,p;  for  i = 

= “n  t eV,  t'»“}  1 it 

t=l 

+ n'1  l g!(x  ;a)‘  £ 1 c'(x ;a) 
t=l 

for  i,h  = 1, , . . ,p. 


The  expectations  and  variances  of  3.2  through  3.7  will  be  sncwn  to 
satisfy  the  assumptions  of  theorems  1.1,  1.2,  and  1.3.  The  expected 
value  of  3.2  is 


3.8 


jk  hi  ik  hi 
a a + a a 


and  the  variance  of  3.2  is  equal  to  n ^ times  a continuous  function 
of  the  elements  of  £,  which  goes  to  zero  uniformly  for  £ in  a compact 
cet.  Terms  3.3  and  3.4  may  be  analyzed  similarly. 

The  expectation  of  3.5  is  zero  and  the  variance  is 


1 


12 

3.9  n 1 [ojJ  n 1 l .i 

ii  -1  r / \T,  -i, 

+ a n 2.  S,  (x  :a)  (z  ) 

"t = J.  ^ ^ * -» 

. . _ n ^ 

+ 2o‘]  n y g^.(xt;a)T  (Z  1)  i g^.(xt;a)T(z  1)  . ] 

u “1 

By  assumption  2,  3.9  goes  to  zero  uniformly  for  tx,E  in  a compact  set. 
The  argument  for  term  3.6  is  identical. 

The  expectation  of  3.7  is 


3.10 


I 1 Vi(-t;-} 

t=l 


which,  by  assumption  1,  converges  as  n -»•  °°,  uniformly  for  a_, Y,  in 
a compact  set.  Since  g!  is  continuous  as  a function  of  ci,  the 
limit  of  3.10  is  continuous  in  a. 

The  variance  of  3.7  is 


-lr  -1  r ll  , ,T  -1  II  , . 

n Ln  l Z g.u(x^;a)J 

t=i  1(1  c 


— in 


which  by  assumption  3,  converges  to  zero  as  n uniformly  fcr 

a,  Z in  a compact  set. 

The  conditions  of  theorems  1.1,  1.2,  1.3  are  satisfied;  the  MLE 
of  a, Z have  the  desired  properties. 


_ 
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3 . 3 Opt  imality  Results  for  Nonrandom  Exogenous  Variables.  and  Fnror 
Terms  which  Form  a Stationary  Process 

The  independence  restriction  on  the  error  terms  rr.ay  be  relaxed. 
Suppose,  specifically,  that 

, . ! -t -s  j 

cov(e  ,e  J - p1  ' E 

— ‘i.  — S 


where  |p|  £ 1.  Since  the  {c_  } are  normally  distributed,  they  form 
a stationary  process.  The  conditioned  distribution  of  £^_ , given 

-t-1’  * • • »-l  is 


[2ir(l-p2)  |E  | ] exp(-(2(l-p2))  * (e_t  - E ) 

The  joint  density  L^(<x,p,E)  of  y,  , . . . ,v  is  the  product  over  t from 
2 to  n of  the  conditional  densities  times  the  unconditional  density 


of 


E . 

—1 


Theorem  3 ■ 2 

For  the  model  3.1,  assume  the  variables  (x  ,t=l,2, . . . } are 
nonrandom  and  the  error  variables  {_£  ,t=l,2, . . . } form  a multivariate 
stationary  process  whose  density  is  multivariate  normal  with  mean  0 
and  covariance  function  p^’^E,  where  p,E  are  unknown  except  that 
0 p < 1 and  E is  positive  definite.  Under  assumptions  1,2,3  of 
theorem  3.1  and 


4) 


-1 


t = l 


-t 


,T_-1  V 
; a ) E g . ( x 


S-j'-t-l’-- 


\a.)  converges 


as  n -» 
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uniformly  for  a,E  in  any  compact  set. 

Then  the  MLl  of  o,p,E  are  consistent,  asymptotically  normally 
distributed,  and  efficient  in  the  MP  sense. 

Proof : 

The  second  derivatives  of  the  log  likelihood  function  are  computed 
so  that  theorems  1.1,  1.7,  and  1.3  may  be  applied.  Let 


= Lt  ' RLt.i 


t=2,3. 


*L  = E-1 


Then  the  joint  distribution  of  {u^,  t=l,...,n}  is  multivariate 
normal  with  mean  zero  and 


c°v(ut,us)  = «ts  (1-p‘te 


Tne  terms  below  will  use  this  notation. 


3 . 11  -n 


-1 


9 log  L (_a,p,Z) 


3o..  'doul 
ij  hk 


hi  jk  ha  ik 

= - 0 oJ  - O J a 


♦ C31'  n-1  u?(  I1)  h uV1)  . 

“-L  . n — 1 .1 

+ u-p2)-1  o-1^  n"1  y uja"1)  uTa"L)  . 

"~t  . h-t  .1 

+ n~x  olk  u^(5:-1)  h u^E''’1)  . 


l 2.-1  ik  -1  S T,  -1  t _i 
+ (1-P  ) o n I u (E  ) U*(Z  X)  . 

t = ? ‘h~z  -3 


2,-1  jh  -1  p T.-l,  T,-l, 

+ vl-p  ) a a ) u (Z  ) . u (Z  ) . 

t~-2  -k^ 


ih  -1  T ,r -1 , T.  -1. 
+ a n u,  (Z  ) , u.  (Z  ' 


3.12  -n 


-;l  3 log  Ln(«,o,S) 


3a . . 3a,  , 
i]  hh 


3.13  -n 


x 3 "log  L^Ox.p.E) 

3a . . 3a, , 
ii  hh 


k^lv"  '.j 


,,  2,-1  ih  -1  r T.-l,  T . -1, 

+ (1-p  ) a n ) u (Z  ) . u. (I  ) . 

t=2 


-or  i / j ; h i k,  i,j,h,k  = 1,...,L. 


hi  hj 
a a J 


jh  -1  T,  -1,  T,  —1 , 
+ aJ  n u_,  (I  ) vu,  (I  ) 


+ (1-p2)  1 a'hn_1  l uJ(Z  _1)  ^(Z-1)  . 

t=2  ‘ 


ih  -1  T_  -1,  -1, 

+ a n u,  (Z  ) , u.  (Z  ) . 

— 1 . h— 1 • . j 


..2,-1  ih  -1  r T*. -1,  T*.  -1, 

+ (1-P  ) a n ) utJ  ) , u (L  j . 

t=2  T ■' 


for  i t j ■>  i,  j ,h  = 1, . . . ,L 


T*  1 


• n— t .1 


hi 

-1  T 

a n 

u. 

-1 

n 

T • 
}/(E 

r 

l 

t = 2 

— 1 , • • • *L 


7 Li 


3.14  -n 


^ 5 log  !n(c*»P>£) 

8a,  8a . . 

k ij 


-1  -1,  T , - 1 . 

n Sk'-Hi’O)  > j 


-x  </  >T,  -1.  1,-1.. 

+ n il^X-^a)  (Z  ) uL(Z  ) L 

+ (1-P2)  Ln  1 I Cg^(xt;a)-pgk(xt_1;a)]r(r  ~)  ■, 

+ (1-p'  ) 1n  1 l rg^.(xt‘,a)  - p^.(xc_1;a)]T(E  x)>;LuJ(E  X).j 


i t j ; i, j = 1 ,L;  k = 1, . . . ,p. 


3.15  -n 


8^1og  L ( a, p , I ) 


da,  da . . 
k li 


-1  i T -1  T -1 

n g.  (x  ;a)  ( S ) .u  ( Z ) . 
— k —1  — . i—i  . l 


+ (1-p2)  1n  1 l [gj'.(xt;a'-pg^(x^_1  ;a)]X(  3 ±)  X)  i 


i^/  r“x  ^ ^ f v"l ' 


for  i = 1, . . . ,L;  k = 1, . . . ,p. 


3 . 16  -n 


_1  9^1og  Ln(a,p,I) 

Da.  8a. 

i 3 


-1  , XT_-1 

* n gij<iir«)  1 


t n 1g!^( x^ ;a) 1 £ 1g!(x1;a) 


- (1-p2)  1r.  1 l [gV , ( x.  ;c0 -PgV.  ( x.  -.a)]1!  V 

t=2  J ~J 


+ (1-p2)  Xn  1 l Cg'(x  ;a)-Pf-(x  ia)]1!  £•  (x  :a)-pg!  (:<  ;a) ] 

x T.  xu—X  JL  jl~X 


for  i , j = 1,. . . ,p. 
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3.20 


-1 

-n 


3 _cg  1 • ( a . p , Z ) 
n — 

3 P 


- (n-l)n  ^(i -i-p"  )(l-p" ) 

o 2 -1  -1  r T -1 

+ (l+3p“ ) ( 1-p^ ) n X j u*E  u 

t=2 

,,  2,-2  -1  r T -1 

- t*p(i-p  ) n 2.  ct_i2  H.t 

2,-1  -1  r T -1 

+ (i-P  ) n l 
t = z 


To  apply  theorems  1.1,  1.2,  and  1.3  to  terms  3.11  through  3.20,  the 
'•xpected  values  of  these  terms  will  be  shown  to  converge  as  n -*■  » 
uniformly  in  the  parameters,  to  a limit  function,  and  the  variances  of 
these  terms  will  be  sho.  n to  converge,  as  n -*■  °°,  uniformly  m the 
unknown  parameters,  to  z-  o. 

The  expected  value  of  3.11  is  equal  to  3,8.  To  compute  the 
variance  term,  we  require  the  following. 


T -1  T -1  T — J 
cov(u_,  (j;  l)  Vfc  .,uT(E  ) 


= 6tR(1-p2)/(ohjaik  + ahk°ij) 


Since  all  terms  in  the  variance  are  zero  unless  t=s,  the  variance  is  a 
continuous  function  of  the  parameters  which  is  o(n  1).  The  assumptions 
of  theorems  1.1  and  1.2  hold.  By  the  same  arguments  the  assumptions 
may  be  shown  to  hold  for  3.12  and  3.13. 

The  expectation  of  3.14  is  zero,  since  the  {u  ,t=l , . . . } have  zero 
expectation.  We  will  compute  one  of  the  terms  in  the  variance  of  3.14, 
the  covariance  of  the  second  and  third  terms.  This  covariance  is 


(l-pVV.n'*  l [ ■ (x,  • 
1]  t=l  ~K 


4,v-t-l  i- 


lil -k(~t  -k(: 


-t-l 


;«)]*(! 


-1 


By  assumption  2,  this  term  converges  to  zero  as  n -*■  uniformly  fox' 
the  parameters  in  any  compact  set.  The  other  terms  of  the  variance  of 
3.14  converge  to  zero  by  the  same  argument.  Thus  the  assumptions  of 
theorems  1.1  and  1.2  are  satisfied  for  3.14  and  3.15. 

The  expected  value  of  3.16  is 


-1  ' , ,i.-i  i , . 

n 1 

n , 

y [g!(x  ;ct)  - pg.(x  jo)]1!  [g!(x  ;a)-pg!(x  ;a)] 

1 -x  — —i  — x-i  — — 3 — X — — j — x-j.  — 


T_-l 


+ (l-p‘> 


2,-1  -1 


By  assumptions  1 and  4,  this  term  converges  as  n -*  uniformly  for 
oi^ Z,p  in  a compact  set.  To  compute  the  variance  of  3.16,  we  note  that 
the  variance  of  the  third  term  is 


„ n 


(l-p~)  1n  “ I CgVj(xt-,a)  - p£j 


)] 


By  assumption  3,  this  term  goes  to  zero  as  n -*■  »,  uniformly  for  the 
parameters  in  a compact  set.  Therefore,  by  assumption  3,  the  variance 
of  3.16  satisfies  the  conditions  of  theorem  1.1. 

To  compute  the  expected  value  of  3.17,  we  note  that 


E<£t+k)(£t-p  it.p)  = pW)£ 

for 

k > 0 

E( e )( e -p  e ) = 0 

— ■ t-m  — t — t-1 

for 

m > 1 

The  expected  value  of  3.17  is 


DO 


-2p C 1-p ^ ) * (n-l)n  1a.. 

ij 


which  satisfies  the  conditions  of  theorem  1.1.  To  compute  the 
variance  of  3.17,  we  note  the  following.  The  variance  of  the  first 
term  of  3.17  is 


4p~(l-p")  x(l-n  x)n  1 (oii  Cjj  + o„) 


Po  compute  the  variance  of  the  second  term  of  3.17,  note  that 


cov(  (e  ) ( u ) v 5 (e  , ) (u  ) , ) = 6 (1-p  ) o o,  , 

— 1-1  a — t b — s-1  c — s d ts  ac  bd 


so  that  the  variance  of  the  second  term  is 


2 -1  -1 

(1-p  ) n a . . a . . 

ii  ID 


The  variance  of  the  third  term  is  computed  similarly  tc  the  variance  cf 
the  second  term.  To  compute  the  covariance  of  the  first  and  second  terns 
of  3.17,  we  note  that 


cov  ( ( u ) (u  ) , (e  ) (u  ),)  = 0 
— ■ t a — t b — s-1  c -s  d 


Therefore  the  covariance  of  the  first  and  second  terms  of  3.17  is  zero. 
By  the  same  argument  the  covariance  of  the  first  and  third  terms  of 
3.17  is  zero.  Finally  we  conclude  that  the  variance  of  3.17  converges 
to  zero  uniformly  for  the  parameters  in  a comp.t  t set.  Term  3.19  can 
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be  shown  to  satisfy  the  conditions  of  theorems  1.1  and  1.2  by  the 
same  argument . 

The  expected  value  of  3.19  is  z jo,  since  the  { u_r } have 
expectation  zero.  The  variance  of  the  first  term  of  3.13  is 


4p"(l-p")  3n  2 T [g!(x  ;n)-pg!(x  ..  ;n)]  T ;a)-pg'.  (:<  . w)] 

. ~ 1 “ ■ L — ± — -i-  L 1 

X-2. 


which,  by  assumption  2,  converges  to  ze^o  uniformly  for  the  parameters 
in  a compact  set.  Similarly,  the  variance  of  the  second  term  of  3.19 
goes  to  zero.  The  var.i  mce  of  the  third  term  of  3.19  is 


2 -9  _2  5 
(1-p2)  n l 

T = 0 


n 

l p1 

s — /. 


_1;a)]iS  XCg^xt;a)-pg^(_x{;_1;n)] 


which , by  assumption  2,  also  converges  to  zero  as  required.  The 
covariance  cf  the  second  and  third  terms  of  3.19  is 


(1-P 


!,-l  -2 
) n 


n-1  n 

i l 

t-2  s=t+l 


s-l-tr 


Cg  • ( X t ; a ) ]Tz'~ 1C F [ ( xt ; « ) -pg ! ( .'<t  _ 1 ; a ) ] 


which,  by  assumption  2,  converges  to  zero  uniformly  for  the  parameters 
in  a compact  set.  The  covariance  of  the  first  and  third  terms  of  3.19 
behaves  similarly.  Thus  we  may  conclude  that  3.19  satisfies  the 
conditions  of  theorems  1.1  and  1.2. 

The  expected  value  of  3.20  is 


-(l-n'1)L(l+P' )(1-P2)  ? - (1+3P2)L 


(1-P2)'J 


L] 


r-2 

which  clearly  converges  as  n -+  <*,  uniforinly  for  p in  a compact  set 
in  [0,1).  The  variance  of  the  second  term  of  3.20  is 

i L 

2(l+3p2)2  (l-n'1)n_1  7 1 (o,,)2 

b=l  d-1  bd 

•which  obviously  goes  to  zero  as  n ->  »,  uniformly  for  values  of  p,£ 
in  a compact  set.  The  variance  of  the  third  term  is 

2 2-1  -1  -1  - v 2 

16p  (1-p  ) ( 1-n  )n  2,  1 * ) 

b=l  d=l 

which  goes  to  zero  as  required.  The  variance  of  the  fourth  term  of  3.20 
is 


2 ( .1— p ' ) 2[n~2 


l l l W 

s=2  b=l  d-1 


’•;hich  goes  to  zero  as  required.  By  the  same  argument  which  proved 
that  the  covariance  of  the  first  and  second  terms  of  3.17  was  z^rc, 
the  covariance  of  the  second  and  third  terms  of  3.20  is  zero.  The 
covariance  of  the  third  and  fourth  terms  of  3.20  is 


-4p (l-p2 ) 2 (2  f y (a  )" } n l 

b=l  d=l  bc  t=2  s=t+l 


„ n -l  n 
2 * l p3"1"* 


which  goes  to  zero  as  n -*■  «>,  uniformly  for  the  parameters  in  a compact 
set.  The  covariance  of  the  second  and  fourth  terms  of  3.20  is 

n-1 


<2  l i (o  )2}n‘2  l l P' 
b=l  d=l  t=2  r-ttl 


-1-t 


( 1 +3p2 ) 


8 i 


v.'hich  converges  to  zero  as  required. 

We  have  shown  that  the  conditions  of  theorems  1.1  and  1.2  are  sat- 
isfied for  terms  3.11  through  3.20.  Applying  theorem  1.3,  we  conclude 
that  the  HLE  of  the  parameters  have  the  properties  require  ’ . 

3.4  Optimalltv  Penults  for  Random  Exogenous  Variables  and  Independent 
Error  Te.  ms 

Optimality  properties  may  also  be  obtained  for  the  model  3.1  when 
the  exogenous  variables  are  rand  .n.  Assuming  that  the  { x } are-  a 
stationary  process  is  not  sufficient  to  obtain  the  results,  since  the 
stochastic  convergence  of  functions  of  gUx^;o_)  and  gl^Cx^jn),  which 
are  not  necessarily  stationary,  is  required. 

Theorem  3 . 3 

for  model  3.1,  assume  the  variables  {x  } are  random  and  identically 
distributed.,  independently  of  the  error  terms  (e  } , which  are  independ- 
ently distributed,  multivariate  normal  with  mean  _0  and  positive 
definite  covariance  E.  Assume  also  that 


T 

1)  |E(r'.(x  ;'Og!(x  ;a)  )!  exists,  ind  all  elements  are 


-s  - - -q  -x  - 
bounded  above  uniformly 

and  for  t , s = 1,2,... 

2)  |E(gi'(xt;a)£^(xs;«)T)  \ 

bounded  above  uniformly 
and  for  t,s  = J ,2, . . . 

3)  There  exist  constants  C 


for  a in  any  compact  set , 

exists,  and  all  elements  are 
for  a in  any  compact  set . 

> 0 ana  0 < n < 1,  such  that 


IMM 


.4 


for  i,j  - l,...,p  and  for  t,s  = 1,2,... 


cov(g'.(n  :a)T  E ^'.Cx  -,a),  g '•  ( * ;a)‘  I 

a.  L J L 1 b J w 


,a) ) I 


Cn 


t-s 


for  all  a in  any  compact  set . 

Then  the  MLE  cf  a and  E arc  consistent,  asymptotically  normally 
distributed,  and  efficient  in  the  HP  sense. 

Proof : 

The  second  partial  derivatives  of  the  log  likelihood  function  a^e 
terms  32  through  3.7,  since  the  {x  } are  distributed  independently 

4.. 

of  {£■  },  and  their  common  density  is  not  a function  of  the  unknown 
parameters . The  expectation  and  variance  of  terms  3.2,  3.3,  and  3.4 
are  as  calculated  in  the  proof  of  theorem  3.1. 

The  expected  value  of  term  3.5  is  zero  and  the  variance  is 

r.  h 1 ) _ p 

+ 2clj(£  ')•  E(g/ (x;a)g* (x;a)T)(£  1)  . 

J*  — X — — — K ■ — .1 

+ a^'  <.£  1 ) . E(jd,'  (x;a)g'  (x^)"1  )(E  J‘)  .} 
j . k — k . j 


3y  assumption  1,  the  term  in  brackets  above  is  bounded  uniformly  for  a, 
5 in  a compact  set.  Therefore  the  conditions  of  theorems  1.1  and  1.2 
hold  for  3.5.  The  mean  and  variance  of  3.6  are  computed  similarly. 

The  expected  value  of  3.7  is 

E(g  1 (x ;n)T  £ x g^(x; a)) 
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wmch  exists  by  assumption  1 and  is  continuous  as  a function  of  a 

. t f 

since  g..  exists.  The  variance  is 


“1_,  If,  'f  -111,  ^ , 

n E ( g . , ( x;o)  Z g . y (. x *, a ) ) 


-m  — — 


n n 

+ n 2 l l cov(gl(x  ;a)T  Y “V/  (x  ;a) , g!(x  ;a)T  Y~±  f,'(xr.;  a)) 
t=i  s = l x ' s 


By  assumption  2,  the  first  term  goes  to  0 as  n -*■  <*•  uniformly  for 
a,Z  in  a compact  set.  By  assumption  3,  the  uniform  convergence  of 
the  second  term  is  assured.  Therefore  theorem  1.3  may  be  applied  to 
guarantee  the  optimal  properties. 

3.5  Optimal ity  Properties  for  Random  Exogenous  Variables  Independent 
of  the  Error  Variables , which  Form  a Stationary  Process 

We  are  able  to  prove  optimal  properties  for  the  KLE  when  the 
sequences  (xj  and  {c  } are  independent,  but  the  variables  within 
each  sequence  may  be  dependent,  with  certain  restrictions.  The 
restrictions  that  the  {x^_ } be  identically  distributed  is  removed, 
since  it  does  not  greatly  simplify  the  assumption  . 

Theorem  3_.4_ 

For  the  model  3.1,  assume  the  variables  {x  } are  random  and 

— t 

distributed  independently  of  the  error  variables  {r,}  which  form 

a multivariate  stationary  process.  The  common  distribution  of  each 

{r,}  is  multivariate  normal  with  mean  vector  0.  The  covariance 

|t-s| 

matrix  of  e and  e it  p for  t,s  = 1,2,...,  where  p and 


£ are  unknown  except  that  0 < p < 1 and  £ is  positive  definite. 
Assume  also  that 


^ * * *P  ^ 

1)  li.iTi  n l i’[g.(x  ,n)1£  ~g-'(x  ; a 1 ] exists  for  s=t  or 
n—  t=2  ~1  ~ ^ ~ 

s=t-l.  For  both  limits  the  convergence  is  uniform  for  c» 
in  any  compact  set. 


2)  The  elements  of  Intel (x  ;a)g!(x  ;a)  ] are  bounded  above 

—i  — t ■]  s — 

uniformly  for  a in  any  compact  set  and  for  s,t  = 1,2,.. 


3)  The  elements  of  i E[g.'  1 (xt :,a)g^  (x^  -,a )~]  j are  bounded 
above  uniformly  for  a in  compact  set  and  for  s,t  = 1,2, 

4)  |cov[g!(x  ;«)r£  1g!(x  ;a),  g!  (x  :«)  r£_1gl  (x  ;«)]j<Cr  ^ 

J.  — ' — J — (J  — “”1  *"  "V  — — — J — V — 

for  u = t,  t - 1,  for  v = s,  s - 1 and  for  values  of 
a in  any  compact  set,  where  C > 0 and  0 <.  r < 1. 


Then  the  MLb  of  «,p,  and  £ are  consistent,  asymptotically  normally 
distributed,  and  efficient  in  the  KP  sense. 

Prot  i : 

Since  the  variables  {x^. } are  distributed  independently  of  t'n  ■ 
error  terms,  and  their  joint  density  is  not  a function  of  the  unknown 
parameters,  the  second  partial  derivatives  of  the  log  likeiihood  are  as 
computed  in  3.11  through  3.20  of  theorem  3.2. 

The  arguments  that  3.11,  3.12,  and  3.13  satisfy  the  conditions  of 
theorems  1.1  and  1.2  are  identical  to  the  arguments  in  the  proof  of 
theorem  3.2  since  these  terms  do  not  involve  the  exogenous  variables. 
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By  assumption  2 and  the  independence  of  {e.^}  and  -f  , the 
expected  value  of  3.14  is  zero.  The  covariance  of  the  third  and  fourth 
terms  of  3.14  j.s 


a (1-p  )n  7 h (x  ;a)  (E-1)  . h,  (x  ;u )T( T.'1 ) . 

J t=2  k -t  ~ . l x -t  - .] 

where  h (x,_;o<)  = g'(x  a)  - pg'Cx^.  -a) 

By  assumption  2,  the  above  term  goes  to  zero  as  n •+  °°  uniformly  in  a 
compact  set  of  the  parameters.  The  same  argument  applies  to  all  terms 
in  the  variance  of  3.14.  The  expectation  and  variance  of  3.15  behave 


similarly. 

By  assumptions  2 and  3,  and  the  independence  of  { x^_ } and  { e_+ ) , 
the  expected  value  of  3.16  exists  and  is  equal  to 


„ . , n 

.2.-1  -1  v 


(l-P^)  'n  ' l E{  [g!  (x  ;<a)-Pg!  (x  .;«)]'  L [g’(x  ;a)  - Pg!(x  ;«)]} 
t=2  _L  ~ "t“1  ~ ~j  ~ ~t_1  ~ 


By  assumption  1,  the  expected  value  of  3.16  converges  as  n -*■  00 
uniformly  in  a compact  sot  of  the  parameters.  To  compute  che  variance 
of  3.16,  we  note  that  the  variance  of  the  third  tern  is 


(l-p^)n"2  7 E[r  !^(x+  ;ct)  - pg!_!  ( 
t-2 


*L3  -t’^'  - pp4-^t.rf‘)] 


which,  by  assumption  3,  converges  to  zero  in  the  manner  required. 
Assumption  4 implies  that  the  fourth  term  of  3.16  behaves  similaily. 
Since  the  covariarce  of  the  third  and  fourth  terms  of  X . 10  is  z ro 
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we  may  conclude  that  the  variance  of  3.16  converges  to  zero  uniformly 
i:i  a compact  set  of  the  parameters. 

Since  terms  3.17  and  3.13  are  not  functions  of  {x^},  their 
expectations  and  variances  converge  as  in  the  pi^oof  of  theorem  3.?. 

The  independence  of  the  sequences  {x^}  and  {,£ t } and  assumption 
2 imply  that  the  expected  value  of  3.19  is  zero.  The  variance  of  the 
first  term  of  3.19  is 


4p‘(l-p2)  xn  2 l E{[gUxt;a)-rg^xc_l;a)]1E'1[gUxt;c)-P7ii(xt  j ;o)]  } 


which,  by  assumption  2,  converges  to  zero  in  a compact  set  of  the 
parameters.  Assumption  2 implies  that  the  variance  of  the  second  term 
behaves  similarly.  The  variance  of  the  third  term  is 


2.-2  -2  ? ? |t- 

(1-P  ) n ) i p 1 

t=2  s=2 


^“P^i^-t-l  ■ ^-T  " PSi(^s-l 


Lemma  1.1  and  assumption  2 imply  that  the  above  term  goes  to  zero  as 
required.  The  covariance  of  the  second  and  third  terms  of  3.19  is 


o i o 41”1  ^ _ -i 

, , 2,-1  “2  r \ S-l-t 

(l-P  ) n l l p 
t=2  s=t+l 


which,  by  assumption  2,  converges  to  zero  as  required.  By  the  same 


argument,  the  covariance  of  the  first  and  third  "tern  of  }.19  converges 
to  zero  as  n •>  <*>.  Therefore  the  variance  cf  3.19  satisfies  the 
co:.  Jit  ions  of  theorems  1.1  and  1.2. 

By  the  arguments  of  theorem  3.2,  the  expectation  and  variance  of 
3.20  satisfy  the  conditions  of  theorems  1.1  and  1.2. 

Theorem  1.3  implies  that  the  MLE  of  £,p  and  E have  the 
propert ies  required . 

2.6  Optimal  Properties  when  If.  'Opendence  Between  Exogenous  Variables 
■ml  Error  Variables  is  not  ;•  suited 


When  independence  between  the  exogenous  variables  and  the  error 
terms  is  not  assumed,  the  joint  conditional  distribution  of 
given  x_,  , . . . ,x^  must  be  known  in  order  to  obtain  the  MLE  of  the 
unknown  parameters.  For  simplicity  we  are  assuming  that  the  distribu- 
tional parameters  of  the  marginal  density  of  , . . . , x are  not 
functions  of  the  parameters  to  be  estimated.  Also,  we  will  assume  that 
the  joint  distribution  of  (£,... ,£^  ,x^ ,... ,x^)  = (£(n),£(n))  is 
multivariate  normal,  of  dimension  n(L+K)  with  mean  vector  0_  and 
covariance  matrix 


3.21 


E 


ex 


L 

X 


where  the  covariance  matrix  is  partitioned  conformably  with  £(n)  ani 
x(n).  The  joint  conditional  distribution  of  the  error  variables,  given 
the  exogenous  variables,  is  multivariate  normal  of  dimension  nL  with 


AD-A042  592  CORNELL  UNIV  ITHACA  N Y SCHOOL  OF  OPERATIONS  RESEARC— ETC  F/G  5/3 


OPTIMAL  ASYMPTOTIC  PROPERTIES  OF  MAXIMUM  LIKELIHOOD  ESTIMATORS  — ETC(U) 
MAY  77  M K VICKERS  DAAG29-73-C-0008 

UNCLASSIFIED  TR-334  NL 

aoao4^592 


I 


90 


c(n) 


Z 


ex 


Z 1 x(n) 
x — 


and  covariance  matrix 


1 


ft(n)  = I 


Z Z 
ex 


-1 


v 

uxe  * 


The  conditional  distribution  of  y^, • • • given  x(n)  is  multivariate 
normal  with  mean  g(n;a)  + c_(n),  where 


gCnia)1  = (g(x1;£)T,  • • • ♦ g(xn;a)T) . 


The  covariance  matrix  is  s'i(n).  Tor  future  reference,  the  t'th  com- 
ponent vector  of  c_(n)  will  be  denoted  £ ,t=l, . . . ,n. 

Without  the  assumption  that  {e^}  and  {x^_}  are  jointly  normally 
distributed,  the  conditional  distribution  is  difficult  to  obtain.  The 
assumption  that  the  exogenous  variables  have  £ mean  is  made  so  that 
the  MLE  of  £ is  not  a function  of  this  unknown  mean.  The  exogenous 
variables  are  not  required  to  be  identically  distributed.  We  will 
estimate  the  conditional  covariance  matrix  ft(n)  of  the  error  terms, 
under  two  assumptions.  It  is  not  necessary  to  assume  that  the  error 
variables  are  identically  distributed  or  that  they  are  independent. 
Instead,  the  joint  distribution  of  £(n)  and  x(n)  will  be  assumed 
to  have  the  following  properties : 


1 

j 


1) 

var(e. -c. ) 

= var(e  -c  ) 

for  t,s=l,2 

— t — t 

— s s 

2) 

cov(e  ,e  ) 

- cov(e  c ) = cov(c. ) 

= cov(c. ,c  ) 

— t — s 

-t  — s — t — s 

— t — s 

for  t^s,  t 

>s-l,2, • • • 

SI 


The  vector  £(n)  is  that  vector  in  the  linear  subspace  spanned 
by  £(n)  which  maximixes  the  covariance  with  £(r.).  Stated  differ- 
ently, the  regression  of  _e(n)  on  £(n)  produces  E Ex^  as  the 
vector  of  coefficients.  The  vector  difference  e(n)  - c(n)  is 
uncorrelated  with  yin).  The  second  assumption  above  implies  that  the 
covariance  betv?een  different  disturbance  terms  is  completely  attribut- 
able to  variation  in  their  projection  on  the  subspace  spanned  by  x(n). 
The  first  condition  assures  the  homoscedasticity  of  that  part  of  the 
error  term  which  is  not  attributable  to  x(n).  Under  these  assumptions, 
the  covariance  matrix  ft(n)  is  block  diagonal;  the  covariance  matrices 
which  form  the  blocks  are  equal,  say,  to  ft.  The  log  likelihood  function 
is 


3.23  - (r./2 ) log j ft | - (l/2)[yt-£(xt ;«_)  - £t]Tft 

plus  a term  which  is  not  a function  of  the  unknown  parameters. 


Theorem  3 . 5 

For  the  model  3.1,  assume  that  the  joint  distribution  of  E_(n)  and 
x_(n)  is  multivariate  normal  with  mean  £ and  covariance  matrix  given 
by  3.21.  Assume  that  conditions  1 and  2 above  hold  for  the  covariance 
matrix.  Assume  also  that 


T 

3)  The  elements  of  |E(  h(£t;a)h(xt;a)  )|  are  bounded  above 

for  t=l,2,...  and  for  £ in  any  compact  set,  where  the 

function  h is  one  of  g,'  or  g!..  Also,  the  elements 

^k  ^-lj 

of  | E ( g ' (x  ;a)g! !(x  ;u)T)|  are  bounded  above  for 
— K t — S 

t,srl , for  a in  any  compact  set. 
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4)  n J'  l E(g'.(x,.;a)iQ  *g!(x  ;a))  corn  merges  as  n -*•  00 , 

— i — u — — ] — s — 

L~X 

uniformly  for  ci,ft  in  any  compact  set,  for  i,j=l,...,p, 
for  t,s-l,2,... 


5)  |cov[g!(x  ;a)Tft~“g^(x  ;a),  g!  (x  ;a)T.0_1g ! ( x • a)  ] j <Cn 
— 1 — Z — — J — X.  — — 1 — s — —i  — s — 


for  i,j=l,...,p,  for  t,s=l,2,...,  where  C > 0 and 


o < n <i. 

6)  n ' T E[h(x  ;a)c  ‘],  n J E[h(x^_  ;a)c  ] converge,  as 

t=l  t=l 

n -*■  uniformly  for  a in  any  compact  set.  This 
condition  holds  when  h is  one  of  or  g!j , for 

i , j 9 , • • * »p • 

7)  For  t,s=l,2,...,  the  absolute  value  of  each  element  of 

T T 

the  matrices  E(h(x+;a)e^  ),  E(h(x  ioOc^,  ) is  less  than 
I t-s  I 

Cn  , where  C,n  are  defined  in  condition  5,  and  h 
in  condition  6. 

Then  the  MLE  of  a,Q  are  consistent,  asymptotically  normally  distributed 
and  efficient  in  the  MP  sense. 

Proof: 

The  second  partial  derivatives  are  of  the  form  3.2  through  3.7, 
where  a change  of  notation  is  made  from  E to  S2,  and  from  to 

T* 

u^=e  -c  . For  future  reference,  E(u(n))=0  and  E(u(n)u(n) * )=fi(n) , 
so  that 
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The  expected  value  of  the  term  of  the  form  3.2  is 


hi 

w 


jk 


* 


hj 


where  ii>1J  is  the  i,j'th  element  of  Q X.  The  convergence  of  the 
variance  of  this  term  to  zero  as  n ->  °°  uniformly  for  cx  in  a compact 
scr,  also  follows  from  3.24.  The  expectations  and  variances  of  the 
terms  corresponding  to  3.3  and  3.4  behave  similarly,  by  the  same 
arguments. 

Assumption  6 implies  that  the  expectation  of  the  term  correspond- 
ing to  3.5  converges,  as  n + ”,  uniformly  for  a_,Sl  in  a compact  set. 

A representative  term  in  the  variance  is 


3.25 


var[n  1 l |L.(x  ;a)T(^  1)  • ujui”1) 

— t — . l -t  O 


t=l 
n n 


= n~2  l l [(ft”1).  E(£^(x  -,ct)  gj(x  ;a)T)(n  X)  . ]* 

t=l  s=l  x*  ^ 1 ^ ^ -1 


[(a"1).  E(u,uT)(f2_1)  .] 
3 . — t-s  . ] 


+ n ~2  l l [(tl  1).  E(^(x.;a)u!)(«'i)  .3* 

t=l  s=l  x*  "*  "*  ~“* 

[(ft  ^ E(g,’<.(xs;a)u^)(fi  1)  ^ 3 


Tw  -1, 


The  first  term  of  3.25  goes  to  zero  as  n -*■  00  by  assumptions  1,2,  and 
3,  or,  equivalently,  3.24.  The  second  term  of  3.25  goes  to  zero  as 
n -+■  00  by  assumption  7.  For  both  terms,  the  convergence  is  uniform  in 
a compact  set  of  a,n.  By  these  arguments  each  term  in  the  variance 


goes  to  zero  as  required.  The  term  corresponding  to  3.6  follows 
similarly. 
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The  expected  value  of  the  term  corresponding  to  3.7  is 


-In  rr  " / xT  -1  n 

-n  I E[g.  (x  ;a)  ft  r ] 

t-1 


-IV  - . T -1  XT 

+ n l Llg.  (•<  ;a)  ft  g,  (xt;a)J 

t=l 


Assumptions  4 and  6 i...ply  the  uniform  convergence  of  these  terms  for 
a.,  ft  in  a compact  set.  Assumptions  1,2, 3, 5,  and  7 imply  that  the 
variance  goes  to  zero  uniformly. 

Applying  theorem  1.3,  we  conclude  that  the  MLE  of  a_  and  R 
have  the  required  optimal  properties. 


3.7  Example  of  a Nonlinear  Model  under  Various  Assumptions  on  the 


Exogenous  Variables  and  the  Error  Variables 


An  interesting  example  of  a nonlinear  model  which  is  econometric 
in  origin  is  proposed  by  Joreskog  and  Goldberger  [1975] . The  model 
arises  as  follows.  The  scalar  variable  w is  determined  by 


w_  = abt  + v. 
t — — t t 


for  t=l,2, . . . 


where  x^  and  « are  Kxl  vectors  and  v^  is  normally  distributed  with 

2 

mean  0 and  variance  o . The  I.xl  vector  y_t  is  determined  as 


v = Bw+  + ut 


for  t=l  ,2  , . . . 
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w’uere  6 and  u^  ar«  Lxl  vectors,  and  u.t  is  normally  distributed 
wirh  mean  0_  and  covariance  matrix  0.  Combining  3 2o  and  3.27,  the 
authors  obtain  the  model 

y.  = 3^  axxt  + £ for  t=l,2,... 

where  £ has  the  multivariate  normal  density  of  dimension  L with  mean 

2 T 

£ and  covariance  matrix  Z - a B6*  + 0.  The  indeterminancy  in  the 
model  parameters  is  removed  by  setting  c>z  = 1.  The  authors  determine 
the  MLE  of  the  parameters  when  the  exogenous  variables  are  nonrandom 
and  again  when  the  exogenous  variables  are  random  and  vary  jointly 
with  { y t } . 

The  theorems  of  this  chapter  may  be  applied  to  obtain  optimal 
properties  of  the  MLE  under  various  assumptions. 

1)  If  the  exogenous  variables  are  nonramdom  and  bounded 

“1  ^ T 

above,  n £ x x converges,  and  the  error  terms  are 
t=l  t ' 

independently  distributed,  then  the  optimal  properties 
of  the  MCE  hold.  This  conclusion  follows  from  theorem 

3.1. 

2)  If  the  error  variables  form  a multivariate  process  with 

|s  I 

mean  £ and  covariance  function  p1  'E,  where  0 £ p < 1, 
the  conditions  involving  x.  in  1 are  satisfied,  and  if 
-1  n 

n £ x £ converges,  the  optimal  properties  of  the 


MLE  are  guaranteed  by  theorem  3.2. 


96 


3)  if  and  x 


x_t  arc  jointly  distributed  as  multivariate 


normal  deviates  vfith  mean  0 and  covariance  matrix 


n f 


4> 


for  all  t,  where  the  covariance  matrix  is  partitioned 
conformably  with  e_  and  x^ , and  e_  ,x  are  serially 
independent,  then  the  optimality  properties  of  the  MLE 
for  a,3,  and  Q— vp x p follow  from  theorem  3.5.  A 


sequence  of  random  variables  {b  } is  said  to  be  serially 
independent  if,  for  any  t and  any  s,  t?  s,  b and 
bg  are  independently  distributed. 


! HAP TER  IV:  SYSTEMS  OF  EQUATIONS 


1 


4.1  Results  from  ..he  literature 

The  results  for  estimation  of  multivariate  models  presented  in 
the  previous  chapters  contribute  to  the  theory  of  optimal  estimation 
for  simultaneous  equation  systems.  The  distinction  to  be  made 
between  general  multivariate  models  and  systems  of  simultaneous 
equations  in  structural  form  involves  the  estimation  cf  coefficients 
of  the  endogenous  variables  in  the  .latter  model.  Assumptions  must 
be  made  to  guarantee  that  theorem  1.4  applies,  so  that  optimal 
estimators  for  the  parameters  of  the  structural  model  may  be 
obtained  from  optimal  estimators  of  the  reduced  form  model.  In 
this  chapter,  we  will  cite  results  concerning  optimality  properties 
of  MLE  fror.i  the  literature,  and  modify  theorems  from  the  previous 
chapters  to  cover  estimation  of  structural  coefficients  of 

systems.  j 

Koopmans  and  Hood  [1953]  investigated  the  structural  equation 

modal 

4.1  Bjr  + Vz_t  - ut  for  t=l,2,...  1 

where  B is  an  LxL  nonsingular  matrix  of  unknown  parameters,  V is 

J 

an  LxK  matrix  of  unknown  parameters , y^  and  ut  are  Lxl  vector  j 

variables,  and  z is  a Kxl  vector  of  either  lagged  values  of  the 
endogenous  variables  or  exogenous  variables.  The  joint  density  of 


97 


the  elemeu'es  of 

u is  multivariate  normal  with 

mean 

0 

and 

covariance  macri: 

: I . The  disturbance  terms  u^ 

and 

— s 

are 

independent  for 

tr's  and  are  independent  of  all 

— t’ 

The 

authors 

assume  that  all  equations  a re  identified;  a detailed  discussion  of 

T T T 

this  concept  is  contained  in  section  4.2.  Let  = (jr  ). 

Let  M , ba  the  moment  matrix  of  the  variables  a and  b defined  by 
ab  — — 


4.2 


ab 


n 

I 


t=l 


(a)  (b) 
— t — t 


Let 


W be  defined  by 
rr- 


4.3 


W = M -MM  "1M 
rr  rr  rz  zz  zr 


Then  the  MLE  A of  A = [B,  T]  is  shown  to  be  that  value  of  A which 
minimizes 

det(AM  A1) 
rr 

4.4  V = =r- 

det( AW  A ) 
rr 

The  MLE  of  the  covariance  matrix  Z is  given  by 


4.5 


Z = AM  AT 
rr 


Assuming  that  approaches  a nonsingular  limit  matrix  in 

probability,  the  MLE  are  shown  to  be  consistent  and  asymptotically 
normally  distributed. 
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4.?  Applications  of  Theorems  for  Nonlinear  Models  r0  Linear 
Systems  of  So nations 

We  shall  apply  the  results  pertaining  to  nonlinear  models  in 
Chapter  III  to  model  4.1  to  prove  optimality  properties.  The 
structural  model  4.1  is  transformed  to  its  reduced  form 


4.6  y = nz  + e for  t=l,2,... 

where  n = -B  and  e = D . The  density  of  the  vector 

"t  u 

e is  multivariate  normal  with  mean  0_  and  covariance  matrix 

-1  -IT 
n = B E(B  ) . 

Following  the  standard  usage,  the  equations  of  a system  will  be 
called  identified  if  knowledge  of  the  joint  distribution  of  the 
observations  y. , . . . ,y  implies  knowledge  of  all  unknown  parameters 
in  the  model.  The  definition  may  also  be  phrased  as  the  existence 
of  a cr.e-to-one  relationship  between  the  parameters  of  the  reduced 
equations  and  the  parameters  of  the  structural  equations.  Several 
authors  give  necessary  and  sufficient  conditions  for  the  identifiabili ty 
of  a single  equation.  For  a particular  equation  of  the  system,  let 
the  variables  be  partitioned  as 


) 


) 


) 


ViOerc-  is  the  H-dr'.nensior  id  subvoctor  of  _/  whose  elements 


- 


! 


u 


' 


I 

i 


have  nonzero  coefficients  in  the  equation,  x ^ is  the  D-dimensional 
subvector  of  z^  whose  elements  have  nonzero  coefficients  in  the 
equation  chosen,  and  is  partitioned  similarly  to  y^. 

It  is  customary  to  assume,  without  loss  of  generality,  that 
one  element  of  v^_  occurs  with  coefficient  restricted  to  be  one. 
Equivalently,  it  could  be  assumed  that  a quadratic  form  in  the 
coefficients  of  v is  restricted  to  be  a constant. 

The  reduced  form  model  may  be  written  as 


where  the  matrix  II  is  conformably  partitioned.  The  equation 
chosen  is  identified  if  and  only  if  the  rank  of  II  is 
H-l.  A necessary  condition  for  identification  is  that  the  number 
of  linear  restrictions  on  the  parameters  of  the  equation  is  equal 
to  the  number  of  equations  in  the  system,  less  one.  Consideration 
will  be  limited  to  those  linear  restrictions  setting  certain 
parameters  of  the  structural  equations  equal  to  zero.  If  every 
equation  of  the  system  is  identified,  there  exists  a one-to-one  map 

T:  (n,I5)+  (3,r,z) 

This  map  clearly  satisfies  the  smoothness  conditions  of  theorem  1.4. 
Following  are  four  theorems  with  various  distributional  assumptions 
on  the  exogenous  and  error  variables. 


— 
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Theorem  4. 1 

For  the  model  4.1,  assume  that  the  exogenous  variables 
{z^,  t=l,2,...}  are  nonrandom,  and  the  error  variables 
{u^,  t=l,2,...}  are  independently  distributed,  each  term  having 
the  multivariate  normal  distribution  with  expectation  £ and 
nonsingular  covariance  matrix  E . Assume  that  B is  nonsingular 
and  that  each  equation  is  identified.  Assume  also  that 


1) 

2) 


{zt,  t=l,2,. . . } 


lim 

n-*-® 


t=l 


-t-t 


is  contained 


exists. 


in  a bounded  set. 


Then  the  MLE  of  B,F,  and  £ are  consistent,  asymptotically 
normally  distributed,  and  efficient  in  the  MP  sense. 

Froof : 

Theorem  3.1  may  be  applied  to  model  4.6.  In  this  case,  g_(£+;H) 

is  the  Lxl  vector  with  i'th  element  (II).  z . Then  the  first 

l •— t 

partial  derivatives  are 

3g(z_;  n)/3(n)..  = (0,. ..  ,(z.  ).,...  ,0) 


where  the  only  nonzero  element  is  the  i’th.  Thus  condition  1 of 
theorem  3.1  requires  that 


(if 


Jik 


-1 


converges,  as  n -*■  « , 


for  j,m=l,...,K  and  i,k=l,...,L.  The 


second  '-ondition  requires  that  {a.  ,t=l,2, . . . } be  contained  in  a 
bounded  set.  The  third  condition  is  not  restrictive  since  g is 
a multilinear  function.  Since  the  conditions  of  theorem  3.1  hold, 
the  MLE  of  n and  n have  the  optimality  properties  cJ aimed.  By 
theorem  1.4,  the  MLE  of  B,T,  and  E also  enjoy  these  optimality 
properties. 

Theorem  4.2 

Assume  for  model  4.1  that  the  exogenous  variables  are  nonrandom 

and  the  error  variables  form  a multivariate  stationary  process 

whose  density  is  multivariate  normal  with  mean  0_  and  covariance 
I s I . 

function  p1  E where  p ,E  are  unknown,  0<p<l,  and  £ 
positive  definite.  Assuming  conditions  1 and  2 of  theorem  4.1  and 

iim  *1  v 1 

3>  r->«  n L ****_■,  exists 

t=l 

the  MLE  of  B,r,E  _ and  p are  consistent,  asymptotically  normally 
distributed  and  efficient  in  the  MP  sense. 

Proof: 

Theorem  3.2.  is  applied  to  model  4.6.  Therefore  the  MLE  of 
H,p,n  have  the  optimality  properties  required.  Applying  theorem 
l-4,  the  MLE  of  B,T,r,  and  P have  the  required  properties. 

Theorem  4. 3 

For  model  4.1,  assume  the  exogenous  variables  { ,t=.l  ,2  , . . . } 

are  r: adorn  and  identically  distributed,  independently  of  the  error 
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terms,  which  are  independently  distributed,  multivariate  normal, 
with  mean  0 and  nonsingular  covariance  matrix  1 . Assume  also 
that 


T 

1)  All  components  of  | E ( )|  are  bounded  above, 
uniformly  for  t,s=i,2,... 


2) 


|cov((z  ).(z  ) , (z  ).(z  ).|  £ cj1'3!  for  i,j  = l,. 

— u 1 — L J o 1 — J 

where  C is  a ncnnegative  constant  and  0<n<l. 


Then  the  KLE  of  B,T,  and  £ are  consistent,  asymptotically  normally 
distributed,  and  efficient  in  the  MP  sense. 

Proof : 

For  the  reduced  model  4.6,  condition  1 above  implies  condition  1 
of  theorem  3.3.  Condition  2 of  theorem  3.3  is  satisfied  since 
g(zt;II)  in  multilinear.  Condition  2 above  implies  condition  3 
of  theorem  3.3.  Therefore  the  KLE  cf  n and  Q have  the  optimality 
properties  required.  Application  of  theorem  1.4  implies  that 
P,F,  and  £ have  the  optimality  properties  required. 

Theorem  4.4 

For  model  4.1,  assume  that  the  exogenous  variables 
{z^,  t = l , . . . } are  random  and  distributed  independently  of  the 
e'Tor  terms,  which  form  a multivariate  stationary  process.  The 
common  distribution  of  each  u 5 r;  multivariate  normal  with  mean 


9 and  nonsinrular  covar'ance  matrix  E. 


The  covariance  function 
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of  the  process  is 


p'^E,  where  0 < p <.  1.  Assume  also  that 


1) 


lim 

n-«» 


-1  r ..  f T.  lim  -1  v T, 

n } E(zz  ),  n ) E(z  z ) exist. 

t_^  — t— t n-*>°  “2  -t-t-1 


2) 

3) 


T 

The  components  of  \z(z^z^  )| 

uniformly  for  t,s=l,2,... 

| cov[ ( z z T)..,  (z  z ‘)  ]j 
1 -u-u  lj  — v-v  hk  ' 


where  u=t,t-l  and  v=s,s-l, 
C > 0 and  0<r<l. 


are  bounded  above 

<_  Cn^t-S^  for  i,j  ,h,k  ,K, 

for  t ,s=l,2, . . . , where 


Then  the  MLE  of  B,F,p,  and  E are  consistent,  asymptotically 
normally  distributed,  and  efficient  in  the  MP  sense. 

Proof: 

Theorem  3.4  is  applied  to  model  4.6  to  yield  asymptotic 
pt'operties  for  JT,p  and  ft.  Theorem  1.4  implies  that  the  same 
properties  hold  for  B,r,p,  and  E. 

4 . 3 Systems  with  Lagged  Dependent  Variables 


Hood  and  Koopmans  [1953]  require  only  that  the  variable  z^_ 
be  independent  of  the  error  term  u^.  This  condition  is 

L 

satisfied  if  lagged  values  of  jr.  are  included  among  the  elements 
of  z and  the  disturbance  terms  are  independent.  However, 
theorems  4.1  through  4.4  do  not  a' low  lagged  values  of  the  dependent 
variables.  To  include  this  case,  theorem  2.3  is  applied  to  the 
reduced  form  of  the  model.  The  model  in  structural  form  will  be 
denoted  by 


4.7 


t=l  ,2  , . . . 


10. 


BG^-t-G 


+ 1 2 

— t 


u 
— t 


are  LxL  matrices  of  unknown  parameters;  BQ  is  nonsingulc: 
F is  an  LxK  matrix  of  unknown  parameters.  The  vectors  y and 
ii  ar'e  Lxl;  the  exogenous  variable  is  Kxl  dimensioned.  The 

density  of  each  error  is  multivariate  normal  with  mean  vector 
0 and  nonsingular  covariance  matrix  Z.  The  reduced  form  of  4.7  is 


4.8 


Yt  = Cplt-i  + •••  + CcXt-G  + D*t  +^t 


t=l ,2 , . . . 


where  Cd  = Bp  for  1=1,..., G,  D=BQ  ^ r,  and  5.t  = -t’ 

for  t=l,2,...  . The  covariance  matrix  of  e_  is  £2  = BQ  ^ 1 (B^ 
To  estimate  the  structural  parameters , each  equation  of  the 
system  must  be  identified.  The  transformation  defined  by 


V:  (Clt...,C6,D,n)  (30,...,BG,r,Z  ) 

is  ono-to-one  and  satisfies  the  assumptions  of  theorem  1.4. 
Theorem  4. 5 

For  the  model  4.7,  assume  that  the  exogenous  variables 
{z  ,t=l,...J  are  nonrandom  and  the  error  terms  (u^ ,t=l , . . . ) are 
independently  distributed,  with  multivariate  normal  density 
with  mean  0_  and  nonsingular  covariance  matrix  Z . Assume  BQ 

is  nonsingular,  iiach  equation  of  the  system  is  identified, 
and  the  stability  condition  holds.  Assume  also  that 


I 

I 

V 


1)  {z^t=l,...}  is  contained  in  a bounded  set. 

-v  lim  -1  nrG  T , . 

2;  n > Z7.  exists,  and  the  convergence  is 

n-*“>  — t— t+c 

uniform  in  c,  c=0,l,...  . 


Then  the  MLE  of  ,B^ , . . . ,p^ ,r , and  E are  consistent, 
asymptotically  normally  distributed,  and  efficient  in  the  MP  sense. 
Proof: 

Theorem  2.3  may  be  applied  to  model  4.8  to  show  that  the  MLE 
of  the  reduced  form  have  the  optimal  properties  required.  Since  the 
conditions  of  theorem  1.4  are  satisfied  by  V , the  MLE  of  the 
structural  parameters  have  the  desired  properties. 

Optimality  results  can  also  be  derived  if  the  exogenous 
variables  are  random. 

Theorem  4. 6 

For  the  model  4.7,  assume  that  the  exogenous  variables  are 

identically  distributed,  independently  of  the  error  sequence 

{ut,t-l,. . .}.  The  error  variables  have- the  same  distribution  as 

specified  in  theorem  4.5.  Assume  that  BQ  is  nonsingular,  each 

equation  of  the  system  is  identified,  and  the  stability  condition 

holds.  Assume  also  that  conditions  1,2,  and  3 of  theorem  2.2  hold 

when  x is  replaced  by  z.  Then  the  MLE  of  B„,B, , . . . ,B„,r,  and 

0 1 

£ are  consistent,  asymptotically  normally  distributed  and 


efficient  in  the  MP  sense. 
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Proof : 

The  assumptions  of  theorems  2.4  and  1.4  are  satisfied;  the 
result  follows. 


4 . 4 Optimality  Properties  when  Independence  between  the  Error 
Variables  and_  the  Exogenous  Variables  5 not  Assumed 

Dependence  between  the  variables  {z_  ,t=l,...}  and 
{ut,t=l, . . . } may  also  be  allowed  in  cases  other  than  those 
when  components  of  z^  are  lagged  values  of  . To  obtain 
optimal  properties  for  the  MLE  in  these  more  general  cases,  theorem 

3.5  will  be  adapted  for  systems  of  equations. 


Theorem  4.7 

For  model  4.6,  assume  that  the  joint  distribution  of  £(n)  and 
z(n)  is  multivariate  normal  with  mean  zero  and  covariance  matrix 


The  conditional  mean  of  f (n)  given  z(n)  is  defined  to  be 

c(n)  = E (n)  E (n)  1 z(n). 

— ez  z — 

Assume  the  following: 

1)  var  (e  -c. ) = var(e  - c ) for  all  t,s=l,... 

— L — L — S — S 

2)  cov  (e  ,e  ) = cov(c. ,c  ) for  all  s^t. 

— C — S — l — G 

3)  The  elements  on  the  diagonal  of  E^(n)  are  bounded  above 
in  absolute  value,  uniformly  fci’  all  n. 
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-1  V i 

4)  n l ) converges  as  n 


t=l 


5)  |cov((z  ).(z  ).,  (z  )Az  ).)!  < C* 
' — ■ t l — t 3 -s  i —a  3 ' — 


t-E 


for  all  i ,j=l,. . . ,K, 


for  all  t,s,  where  C > 0 and  0 ^ n < 1. 
n 

i. 

converges  as  n 

t=l 


6)  n"1  l E(z  e T) 

t=l 

7)  lE«Vj(^.)k)| 

for  all  t,  . 


t-s 


, for  1 - 1 , • •• 

for  k=l, . . . ,L, 


Assuming  conditions  1 through  7,  when  all  equations  are  identified, 
the  MLE  of  B,I’,  and  ^(n)  = Z (n)  - Z,.  (n)  Z (n)  1 Z (n) 

E E Z Z Z 

are  consistent.,  asymptotically  normally  distributed,  and  efficient 
in  the  MP  sense. 

Proof: 

Theorem  3.5  may  be  applied  to  the  reduced  model  4.6.  Assump- 
tions 1 through  7 above  imply  assumptions  1 through  7 of  theorem  3.5. 
Optimal  properties  of  the  structural  parameters  may  be  inferred  frcm 
theorem  1.4  from  the  optimal  properties  of  the  reduced  form 
parameters. 

4. 5 Estimation  of  the  Parameters  of  a Si r -le  Equation  or  a Subs'  stram 

Several  authors  have  considered  the  problem  of  estimating  the 
parameters  of  a subset  or  a single  equation  of  a system  of  equations. 
Maximum  likelihood  methods  for  this  purpose  are  called  limited 
information  methods,  becuase  they  do  not  make  use  of  restrictions 
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on  parameters  which  do  not  occur  in  the  subsystem.  In  contrast, 
maximum  likelihood  estimation  of  all  parameters  of  an  identified  system 
is  called  a full  information  method.  Previous  research  cn 
asymptotic  properties  for  the  limited  information  method  under 
very  general  distributional  assumptions  will  be  cited.  Theorems 
from  the  previous  sections  will  be  adapted  to  subsystems. 

Anderson  and  Rubin  [1949],  [1950]  obtain  the  MbE  for  the 
parameters  of  a single  identified  equation  under  the  assumption 
that  the  error  terms  for  the  complete  system  are  normally 
distributed,  with  mean  zero  and  covariance  matrix  I.  Asymptotic 
properties  for  this  estimator  are  then  proven  under  mere  general 
conditions  . The  equation  under  consideration  is  denoted  as 


eV  + t=i,2 , . . . 

The  variables  y^_  and  z are  partitioned  as  before  into 

subvectors  whose  elements  do  appear  in  the  chosen  equation  with 

nonzero  coefficients,  and  those  whose  elements  do  not.  The  authors 

define  to  be  a linear  transform  of  c such  that  !•!  =0, 

— t — t xs  ’ 

where  M is  defined  in  4.2.  The  vector  1?  is  chosen  to  satisfy 

(M  M _1M  - vW  )b  = 0 

vs  ss  sv  w - - 


where  W is  defined  in  4.3  and  v is  the  smallest  root  of  the 

V v 


determinental  equation 
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|m  m -1m  - vv:  I = o 

1 VS  SS  •'/  vv1 


The  matrix  C>  is  the  matrix  of  the  quadratic  form  normalization 
for  g.  Then  the  MLE  fcr  6 is  given  as 


“ T I/O 

g = b/(b*4>b)x/* 


and  the  MLE  for  y is 


Y = - ?•  M M ^ 
— — vx  xx 


Assuming  that  the  chosen  equation  is  identified  and 


n T 

AR1)  n ''  £ z £ converges  stochastically  to  a nonsingular 

t=l  “t  t 
matrix  R. 

»1  ^ X 

AR2)  n J 5 z converges  stochastically  to  0. 

t=l 

AR3)  The  ratio  of  the  largest  to  the  smallest  eigenvalues  of 


W * = M - (M  M ) 
vv  vv  vx  vs 


M M I -1  / M \ 

XX  XS  / XV  1 


M M 
SX  SS 


M / 
sv  / 
'v  / 


is  bounded  above  in  probability. 

The  authors  prove  that  6 and  are  consistent.  Under  further 
assumptions  which  specify  that 


1 


AR4) 

AR5  ) 

AP.o) 

n (8_  - fl),  R (y  - y)  are  asymptotically  normally  distributed. 
Mo  claims  of  efficiency  are  made;  the  covariance  matrix  £ is  not 
estimated.  If  C>,  the  matrix  of  the  quadratic  form  normalization, 
is  not  constant,  but  is  a function  of  the  covariance  matrix  of  the 
error  terms,  then  the  following  additional  assumptions  are  required 
to  conclude  asymptotic  normality: 


The  sequence  {?.,  } is  bounded  and  may  include 
— c 

lagged  endogenous  variables,  provided  they  are 
bounded  almost  surely. 

For  some  X > 0 and  for  some  M,  E(  | (o_  )-J  '+^ ) < M 
For  each  i,j,k,  and  m. 


S n_1  E«5t)i(lt)j(lt>k(it)m)  6Xi£tS’ 


AP.7) 

AR8) 


AR9) 


lim  -1 


" n * l E(  ( 6 . ) . ( 6 ) . ( 6 . ),( z ) ) 

n-*«>  “ — t i — t j — t k — t m 

n 

n Y E(  (5.). (6.).  ) exists. 
t=I  ~t  i -t  ] 

j,  M > 


lim  _-l 
n-*» 


lim  _-l 
n-r» 


t=l 


exists . 


exists . 


If  it  is  assumed  that  the  disturbance  vectors  are  distributed 
independently  and  normally  with  mean  vector  zero  and  covariance 
matrix  I , and  C>  = E,  then  assumptions  AR1,  AR2,  AR4,  and 
AR5  are  sufficient  fer  the  asymptotic  normality  of  £ and  ^y. 

Rubin  and  Chernoff  [1953]  also  prove  asymptotic  properties  for 
the  MI E of  a subsystem,  computed  under  the  assumptions  of  normality 


and  independence  of  the  error'  terras.  For  the  model  4.1,  the 
variables  are  partitioned  as  before  on  the  basis  of  whether  or  not 
they  appear  in  the  subsystem.  The  submatrices  of  B,F  appearing  in 
the  subsystem  will  be  denoted  B^,r^.  The  following  assumptions  are 
required. 


RC1) 


RC2 ) 


RC3) 
F.C4 ) 


-1  r T 

n l 6^5^  converges  stochastically  to  a 

non singular  matrix  1 . 

-1  V T 

n ) z.z^  converges  stochastically  to  a nonsingular 

t=l 


matrix, 
n 


- I n | 

n l 6 converges  stochastically  to  zero. 


t“l 

n 


n £ E(y.t/^1,'"’^t-l,--l -t^-t  converges 


stocnast icall y . 

FX5  ) n 1 I Cyt-E(^.j;/y1 ....  .yt_1,z_1 , . . . ) ] z^  converges 

stochastically  to  zero. 

RC6)  The  matrix  equation  [B^,T^]M  = 0 defines 

B ,T  as  a singi.e  valued  differentiable  function  of 
M,  for  all  M for  which  the  matrix  equation  has  a 
solution  which  satisfies  the  a priori  restrictions. 


Assuming  conditions  RC1  through  RC6,  the  estimators  of  B^,T^,  and 
computed  by  the  ML  method,  assuming  that  the  errors  are 
independent  and  normally  distributed,  are  consistent.  To  conclude 
the  asymptotic,  normality  of  end  f^. 


two  further  assumptions 


V V ' C :f  ' ' 


113 


n 


P.C7)  The  elements  of  n £ 6 z x are  asymptotical-y 

t=l  t_t 


normally  distributed. 


PCS)  Let  C..  . represent  the  aoyoctotic  covariance  of 

xk,]M  J ‘ 


n'(1/2)  l (6  ) (z  ) and  n"(1/2)  I (5  ) . (z  ) . 

t=l  -t  1 _t  k t=x  — t T — t m 


Then  Cik,jm  -Cn_1  Jx  (-t)i(-t)j]Cn_1  Jx 


n 


converges  stochastically  to  zero,  for  all  i,j,k,m. 


We  apply  our  theorems  on  the  estimation  of  the  parameters  of 
a complete  system  to  the  problem  of  estimating  a subsystem.  We 


will  suppose  that  the  subsystem  consists  of  the  first  equations 


of  the  complete  system  4.1,  where  1<L  <L-1.  We  will  assume  that 
each  of  the  L^  equations  is  identified.  Let  jr  * be  composed  of 
the  elements  of  which  are  present  in  the  subsystem.  Let  H* 


be  that  submatrix  of  H which  consists  of  the  first  L^  rows.  Lr-t 


:_t*  be  the  subvector  of  f which  consists  of  the  first  L, 


■lemants , 


and  ft*  be  the  subinatrix  of  ft  consisting  of  the  first  L^  rows  and 


columns,  ft*  is  assumed  to  be  invertible.  . If  the  error  terms  t.J' 
are  independently  distributed  wit),  meat,  vector  zero,  and- the  exogenous 
variables  are  either  nonrandom  or  distributed  independently  of 
{e*,  t=l,. . .},  then  the  logarithm  of  the  likelihood  function  of 


y *,..., y * is  the  sum  of  a term  which  is  not  a function  of 


4. 


the  unknown  parameters  and 


n r.  , 

- (n/2 )1 oc  |ft*|  - ( 1/23  l ( v * - H*zJ  (ft*)  (v  * - n*z  ) 

11  ^ “ — t — ■ t -t-t  -t 


1 


are  consistent  and 


By  theorem  1.4,  if  the  KLE  of  FT'- , fi*  are  consistent  and 
asymptotically  normally  distributed,  and  efficient  in  the  MP  sense, 
then  the  KLE  of  the  structural  parameters  of  the  subsystem  also 
enioy  these  properties.  A similar  argument  holds  when  the 
{£t*,t=l,. . . } form  a multivariate  stationary  process. 

In  summary,  theorems  4.1  through  4.4  hold  for  subsystems  when 
assumptions  on  {et,t=l,...}  are  assumed  to  hold  only  for  the 
subvector  {£  * ,t=l ,...},  and  when  only  the  submatrix  Si*  is 
assumed  to  be  invertible.  The  distributional  assumptions  on  the 
exogenous  variables  remain  unchanged.  The  same  statement  is  true 
of  the  adaptations  of  theorems  4.5  and  4.6  to  subsystems,  if  among 
the  components  of  z _ are  included  only  lagged  values  of  yj'!  and 

not  cf  the  full  vector  y^.  The  adaptation  of  theorem  4.7  follows 
similarly. 


4 . 6 Singular  Systems 


A singular  system  is  defined  to  be  one  in  which  the  covariance 
matrix  of  the  disturbance  terms  is  singular.  In  this  case,  the 
density  of  the  error  terms  does  not  exist,  when  the  distribution  is 
assumed  to  be  normal.  Singular  systems  result  when  a linear  combination 
of  the  components  of  the  endogenous  variables  is  constrained  to  be 
a constant.  As  an  example,  consider  the  model 


y t = b + r*t  + -t 


for  t=l, . . , 


where  i_  yt=l  for  all  t,  with  j_  being  the  Lxl  vector  whose 
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S components  are  all  unity,  b is  Lxl  dimensioned,  r is  LxK  dimensioned , 

x,  is  Kxl  dimensioned,  and  e is  Lxl  dimensioned.  Models  of  this 
type  are  considered  in  Barten  [1969],  where  the  error  vectors  are 
assumed  to  be  independently  normally  distributed  with  mean  vector 
0 and  covariance  matrix  Z . Berndt  and  Savin  [1975]  consider  this 
•.  cdel  with  errors  generated  from  a stationary  vector  process 
satisfying 

■ 

' e = Re  + u for  t=2,3,... 

— L — T — X u 

where  the  {u  } are  independently,  identically  distributed, 

multivariate  normal  with  mean  0 and  covariance  matrix  I . The 

.T 

r constva.tnt  that  y = 1 implies 

T T T 

4.9  i bsi,  i 1=0,  2 £t=0* 

The  last  equations  imply  that  the  covariance  matrix  is  singular,  for 
both  error  models  described  above.  Barten  proposes  that  one 
equation  of  the  system  be  eliminated  from  the  model,  removing  the 
singularity  from  the  covariance  matrix.  Denoting  by  a/"  b’ne  first 
L-l  elements  of  the  vector  a^. , and  by  r-  the  first  L-l  rows  of 
T,  the  model  with  independent  error  terms  is 

v * = b*  + r*x.  + e * for  t=l,... 

— — t --T 

where  the  covariance  of  e * is  Z" , the  L-lxL-1  nonsingular  matrix 


IL 


i 


lib 


formed  from  I by  removing  the  last  row  and  last  column.  To 


obtain  the  MLE  of 

b,r 

from  b* , r* , 

the 

restrictions  4.9  are  used 

Barten  has  derived 

the 

transformation 

from 

Z"  to  £.  This 

transformation  is 

r L“i 

! ^-1 , : 

! _i  i 

jo 

Z1*  . * 

!0  ' 

r 

1 ~ 

!-l;  . , 

tl-i  i : • = - + L"J2i‘ 

i-i; 

Li  . . . . -i  -i 

0. . 

o 'iT1 

-t. . 

- 1 1 - 1 i 

L _i 

— i 

1 

1 J 

It  is  obvious 

thn 

t the  map 

♦ : 

(b* , r* , E *)  v 

(b,r 

) 

is  bijective  and  satisfies  the  smoothness  conditions  of  theorem 
1.4.  Thus  the  theorems  of  sections  4.2  and  4.4  may  be  implied  to 
the  truncated  form  of  a singular  system  to  obtain  optimal 
properties  of  the  MLE  for  the  full  model. 


4.7  Seemingly  Unrelated  Equations 


A system  of  seemingly  unrelated  equations  has  certain  simplifying 
properties.  The  model  is 


4.10 


= -z-ti  ^i+ 


i=l. 


,L;  t=l,. 


where  z . is  a K.xl  vector  of  exogenous  variables,  and  6.  is 
— ti  l — 1 

the  K.xl  vector  of  unknown  parameters,  bet 


I 


r 

i 

l 


11 V 


z^/  = (2.,T,...,z  ')  and  K = K,  + ...  + KT  . 

— l ~tl  - -i:  L 1 L 


The  model  4.10  may  be  vrritten  in  multivariate  form  as 

r 


y-t 


o11 


0. 


.0  ' 


0 I 


e-l 


- t 1 i : 

,0  -tl  j j — L j 


+ 


t=l,2 , . . . 


rhe  berm  "saening.ly  unrelated"  refers  to  the  fart  that  if  the  covariance 


matrix  of  u were  diagonal  for  each  t,  the  MLE  of  fl..  would  be 


a function  of  (y  )^,  z^,  t*l,...,n,  alone.  The  representation 


of  the  model  as  a system  would  not  produce  more  efficient  (in  any 
sense)  estimators.  It  will  be  assumed  in  this  section  that  the 


covariance  matrix  of  u is  not  diagonal,  for  t=l,2,...  . Assuming 


the  appropriate  conditions  on  the  error  terms  and  the  exogenous 
variables,  theorems  3.1  through  3.5  may  be  applied. 

Recent  interest  has  been  expressed  in  the  efficient  estimation 
of  the  parameters  of  seemingly  unrelated  equations  when  the  error 
terms  satisfy  an  autoregressive  process.  Parks  [1967]  and  Kmer.ta 
and  Gilbert  [1970]  considei’  the  case  where  each  component  of 


u^  satisfies  a first  order  autoregressive  process 


4.11 


(Vj = pj(Vi}j + (-t)j 


t=2 ,3, . 


(Up).  = (1-P^)_(1/2)(£1).  j=l , . . . ,L 


The  variables  f\.  are  independently  distributed,  with  moan  vector 


1J.8 


zero  and  nonsingular  covariance  matrix.  The  stability  condition 

is  assumed  to  hold;  i.e.  , |p^|  < 1,  for  i=l,...,L.  Parks 

proposes  a three-step  estimation  technique  which  is  consistent 

and  asymptotically  efficient  in  the  sense  of  attaining  the  Cramer-Rao 

lower  bound  on  the  covariance  matrix.  For  this  model,  Kmenta 

and  Gilbert  compare  small  sample  efficiency  for  several  alternative 

estimators,  one  of  which  is  Porks'  estimator. 

Guiikey  and  Schmidt  [1973]  treat  the  case  of  vector  autor egress: v 
models 

4.11  u = Ru  _ + e t=2 ,3 , . . . 

— t — t-1  — t 

where  {£  ,t”l,...}  are  distributed  independently  with  mean  £ 
and  nonsingular  cover iance  matrix.  The  stability  condition  is 
assumed  to  hold;  the  I,  roots  of  the  determinental  equation 

4.12  | p - R|  = 0 

are  less  than  one  in  absolute  value.  The  authors  propose  a six  step 
procedure,  similar  to  the  procedure  of  Parks,  which  produces  a con- 
sistent, asymptotically  efficient  (in  the  sense  of  attaining  the 
Cramer-Rao  lower  bound  on  the  covariance  matrix)  estimator  of  P. 

It  should  be  noted  that  those  results  apply  to  autoregressive 
error  structures  of  order  one.  Using  theorems  2.3.  2.4,  and  1.4, 
we  can  obtain  optimal  properties  for  the  P'.F  for  general  autoregressive 
error  structures.  Theorem  4.0  below  may  be  applied  to  a system 


of  seeming.ly  unrelated  equations  with  autoregressive  error 
structure,  or  to  a system  of  identified  linear  equations  with 
autoregressive  error  structure.  Hendry  [1971]  obtained  the  MLE 
and  their  asymptotic  covariance  matrix  for  a model  of  the  latter 
type. 


Theorem  4._R 

For  model  4.1,  where  each  equation  is  assumed  to  be  identified, 
or  model  4.10,  assume  the  error  variables  {u._,  t=l,...} 
satisfy 


Ht 


RA-1  + 


+ R'H— t-H  + It 


t*l,... 


where  the  {e„,  t=l,..,}  are  assumed  to  be  independently  normal 

‘ “t 

with  mean  vector  zero  and  covariance  matrix  E . The  roots  of  the 
determinental  equation 


! H r,  H-l 
! P - R-jP 


= 0 


are  assumed  to  be  less  than  one  in  absolute  value.  If  the 
exogenous  variables  are  nonrandom  and 


1) 

2) 


{_z  , t=l,...}  is  contained  in  a bounded 

lim  -1  V T . . _ , , 

n-Ko  n l £tEt+c  exists  for  all  c - 


set , 

o,i,...; 


the  convergence  is  uniform  ir  c 


i 20 


than  the  MLE  of  B,r,  R , and  E for  model  4.1,  or  (3, 

1 H — 

R1,...,R,{,  and  E for  model  4.10  are  consistent,  asymptotically 
normally  distributed,  and  efficient  in  the  MP  sense. 

Frcof : 

By  the  method  of  section  2.5,  and  by  theorem  2.3,  the  result 
follows . 

Theorem  4 . 9 

For  the  same  models  and  error  structure  as  in  theorem  4.6,  assume 
that  the  exogenous  variables  are  identically  distributed,  independently 
of  {_e.  , t=l,...}.  Assume  also  conditions  1,2,  and  3 of  theorem 
2.2,  where  x is  replaced  by  z.  Then  the  MLE  of  the  parameters 
of  the  model  are  consistent,  asymptotically  normally  distributed,  aM 
efficient  in  the  MP  sense. 

Proof: 

By  the  method  of  section  2.5  and  by  theorem  2.4,  the  MLE  have 
the  required  proportion. 


TOFICS  FOR  FUTURE  RESEARCH 


v 'AFTER  V:  PROBIT  AND  LOGIT  MODELS: 

a . i Introduction 

The  models  covered  in  this  chapter  hove  recently  appeared  in  the 
j it? ran ere.  in  order  to  explain  complicated  behavior  of  the  var- tables 
Lr>  1 liese  models,  the  authors  make  complex  distributional  assumptions 
for  the  observations.  They  suggest  maximum  likelihood  as  the  method 
cf  estimation,  but  do  not  conjecture  or  prove  optimality  properties. 

will  examine  the  nature  of  assumptions  required  in  order  to  invoke 
treorems  1.1,  1.2.  and  1.3.  Sections  5.2,  5.3,  and  5.4  deal  with 
variations  of  the  prebit  model.  Section  5.5  treats  a simultaneous  logit 
model . 

5.2  A One- Limit  Prebit  Mods! 

Tobin  [1958]  deals  with  the  one-limit  probit  model,  where  the 
underlying  distribution  is  assumed  to  be  normal.  Amemiya  and  Boskin 
[12 ’4]  u3e  the  lognormal  as  the  underlying  distribution  for  their  one- 
l.'o.it  probit  model,  to  account  for  the  nonnegativity  and  skewness  of 
h :ir  dependent  variable.  The  authors  postulate  that  the  independent 
observations  (yt , t=l,...}  satisfy 

yt  = zt  if  \ K 

yt  - L if  zt  >.  L, 
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t ••  1,2,...,  where  the  dis-’tr; button  v: 


is  iv.d'.-siificv i log- 


normal,  with  expectation  0 x^,  whe’-c  6_  is  a Kxl  vector  of  unknown 
parameters  and  x^  is  a Kxl  vector  of  exogenous  variables,  for  all  t. 
The  variance  of  z^_ , for  all  t , is  proportional  to  the  square  of 
the  mean,  C 2 ( x ^ ) 2 . "'he  density  cf  z^  , for  all  t,  is  given  by 

, '-[log(zt)-.lcg(9Txt)  + Q2/2]2  , 

gt(zt)  u T7TI72 e:?p/ ~2  ' 

(2j  o 2.  2 a 

t 


2 2 

where  a = log(i+C  ).  Let  F represent  the  normal  c.d.f.  where  the 

r 

mean  is  zero  and  the  variance  is  oz.  Let  f be  the  density  of  F.  Let 


v = logCy^.)  - lcg(0Tx  ) + a2/2 

L c *» 


«T»  O 

i = -lcg(L)  + log^x^)  - a /2. 

L «- 


V-nally,  let  S1  = (t:  z^  > L},  and  S2  = {t : z^  < L}.  Let  T?  be 

the  number  of  elements  in  S2.  The  log  likelihood  function  of  the 
observations,  except  for  a constant  with  respect  to  the  unknown 
pr  rameters , is 


log  Ln(0,  o2)  = l log  F(ut)-(l/2)  l log(o2)-(2o2)_1  [ vz. 


Let  Ft,  f denote  F and  f,  respectively,  evaluated  at  ut.  The 
authors  compute  the  second  derivatives  of  the  log  likelihood  function. 
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. 3 log  L (8,  o‘) 

” J-  il 

30  36 


u f F + c'f  (f  tF  ) , 

, 2.-1  t>  art  t t t 

'no  ) I o y 5 — t— t 

o r‘-/’ftAvN  1 

V--i- 


, 2,-1  r 1 + Vt  T 

+ C io  ) — - — x xx 


-n  13"lcg  L (O^o'1’) 
n — 

n 

30^39 


4 ri  y : ( ut+°2  > ( °2f t-*  utf tFt  )"°2f tFt  | 

-(2a  n)  ). 5~y ? *i 

S.  F‘(0*J 

li  t t l 


a2  - 2v 


- (2o4n)  1 l \ = — )x 

V iSt  J 


3“log  1^(6^,  a") 


3Z(a2} 


, , 4 .-1  v ; " i,  . 2,2 1 tFt  ^ , 

(-4a  n)  l —r  ;/(u  + a ) — j-  + f 

S.  I F.  * \ i oZ 

1 v t ✓ V.  ^ 


( 3u  + 2n'tf  V 

t t 


(T^/(2o4n))  + (T.-,/(4a2n)) 


-4  -1 

-an 


l vt  - c-'n'1  I vf. 
S2  S2  ' 


To  invoke  theorems  1.1,  1.2,  and  1.3,  the  expectations  of  terms 
5.1,  5.2,  and  5.3  must  converge  as  n ■+•  «°,  uniformly  for  0_,  o‘  in  any 


compact  set.  Also,  the  variances  must  converge  to  zero  as  n -*■  ®, 


u.-iforaly  in  the  same  :oise.  If  the  exogenous  variables  are  noardf-dom, 
the  randomness  of  the  first  term  of  5.1  is  dir?  to  the  fact  that  the 
summation  is  over  a random  set  S, . Let  1(E)  denote  the  indicator 
function  which  assumes  the  value  1 if  the  event  E occurs,  and  zero 
otherwise.  The  summation  over  the  random  set  can  be  written  as  tic 

summation,  over  all  t,  of  I(z,  >_  L)  times  the  t’th  term  cf  the 
summand.  Since 


E(I(zt  > L))  = Ft 


ho  cxnectation  of  the  first  term  of  5.1  is 


2,-1  ? /utftFt  + °2ft(ft  + V , 


(na")‘x  T < 
t-1  1 


T 2 


V/e  must  assume  that  this  term  converges  uniformly  as  n for  all 

o 

0,  o in  a compact  set.  The  complexity  of  the  summation  makes  it 
difficult  to  simplify  the  assumption  further.  The  expectation  of  the 
second  term  cf  5.1  is 


0 . n ; (l-log(oV  ) * o2/2)(l-F.)  + /V,  log(zWz)dz''i 

(ny  ) l ^ *tX-t 

t«l  j (.1 2it)  ; 


This  term  also  must  be  assumed  to  converge  uniformly  as  n -*■  ®,  for 
2 

all  0,  o in  a compact  set.  The  expectation  of  5.2  is 


11b 


4 %-l  ? /(“t  + J'‘'  + UtftFt)  " '/'VV. 

- \2o  n)  l \ 7 

t=l  i F.(fl  x ) 

V.  t * 


n , -21og(0Tx  )(1-F  ) - 2 log(z>g  (z)dz 

,^4  x “1  r / —i.  t -00  t 

- (2a  n)  L \ s 

t=l  i,  3 xt 


7ho  expectation  of  5.3  is 


...  4 .-1 
(4a  n) 


I 

t=l 


L(ut 


2,; 

o ) 


:tFt  + Ut/a 


- l 3a. 


xa 


■)1 


+ (4a2)'1 


4-1  4-1  *> 

(2c)  + [(2a  ) - (4o~) 


‘V1 


n 

J/' 


The  two  terms  above  wist  be  assumed  to  converge  as  before.  Since  the 

t -1, . . . } are  independently  distributed,  assuming  that  {x.  , t=J , . . . } 

c 

is  contained  in  a bounded  set  is  sufficient  to  conclude  that  the 

variances  of  5.1,  5.2,  and  5.3  converge  to  zero,  as  n -*■  «>,  uniformly 
2 

for  P_,  a in  a compact  set. 

5.3  A Two - Limit  Profit  Model 

Rosett  and  Nelson  [1375]  consider  a model  in  which  the  endogenous 
variable  is  restricted  by  an  upper  r.s  well  as  a lower  limit,  but  is 
continuous  between  the  limits.  The  behavior  of  the  variable  between 
the  limits  is  explained  by  a linear  function  cf  exogenous  variables;  vo 
will  relax  this  specification  to  include  nonlinear  functions  as  well. 


ions  to  bo  determined  below.  Let  F and  f be  as  defined  m section 

L.2.  Let  {L14_,  t=l,...}  and  {LOH.,  be  sequences  of  real 

j.  r -c- 

i.nnbers  v;ith  I,  < Lot  for  ail  t.  The  behavior  of  the  dependent 
variables  {y  , t=l , . . . } in  given  by 


> L2t};  S3  = {t: 


Lit  = Zt  i L2t}' 


Lot  ult  = g(xf;  9_)  - Llt  and  ^ = e(xc;  0)  - L^.  Then  the  logarithm 
of  the  likelihood  function  is 


log  L (fl,  o2)  = l loC(l-F(u  )) 
n q x ^ 

bi 

+ y iog(F(u2t)) 

S2 

+ l [-  log(a)  - (2a2)  l(yt  - g(xt;  0.))2J- 


\s  u'.i  Chrotc-.*  IIj  , v.*c*  v‘cv  r . . » 

1 -4. 


. ''(x  ; e)  “ D2g(x  ; 0)/36.3O..  The  second  partial  derivatives  of  the 
ij  — t --  — t i 3 


• ng  likelihood  function  are 
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u„.  ) u f(u  ) j f(u2+)  \ 
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1 
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_2  32log  Ln(0,c2) 

n 20.3a 
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S1 
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>|a(l-F(ult))2 


it ' " It 
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Assuming  that  the  exogenous  variables  are  nonrandom,  the  randomness 

\a  the  summations  of  the  first  two  terms  of  5.5  is  due  to  the  renoo.- 

ets  3^  and  S^.  As  before,  the  summation  over  may  be  replaced 

by  a summation  over  t from  1 to  n,  if  the  indicator  function 

Xi  z < l ) is  written  as  a factor  of  each  of  the  terms  of  the  suinma- 
v t It 

tion.  We  have  that 
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5,6,  we  note  that 


It  must  be  assumed  that  rhe  expectations  of  tre  third  terms 
c. .•nverg'S  ir.  the  sense  specified  previously.  In  order  to  show  that  the 
•.  tc.noen  of  5.4,  5.5,  and  5.5  converge  to  zero  as  required,  it  is 
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absolute  value  uniformly  for  all  xt  , that  gj! , for  i,;i=i, . . . ,K, 
i.  bounded  uniformly  in  absolute  value  for  ail  and  0 in  any 
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compact  set,  and  the  sequences  {L^}  ana  {L0j.}  are  hounded  uniformly 
in  t . 

5.4  A Threshold  Regression  Model 

The  threshold  regression  model  of  Dagenais  [1975]  specifies  that 
the  vaJue  of  the  dependent  variable  y remains  fixed  at  a value  L 
ntil  the  action  of  either  of  two  groups  of  exogenous  variables  forces 
it  across  either  a lower  or  an  upper  threshold  value.  Unlike  the  two- 
rim.lt  probit  model,  the  dependent  variable  of  the  threshold  model  has 
either  upper  nor  lower  bounds.  The  behavior  of  y is  described 
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The  cross  derivatives  with  respect  to  the  variance  terms  are 
either  nonrandom  or  zero.  The  expectations  cf  5.7,  5.8,  and  5.9  mast 
ce  f'ticuined  to  converge  as  n 00  uniformly  for  the  parameters  in  any 
' o!  ract  set.  Since  the  error  terms  were  assumed  to  be  serially  inde- 
pendent, the  variances  of  the  second  partials  converge  as  required  when 
the  exogenous  variables  are  contained  in  a compact  set. 

5.5  A Simultaneous  Logie  Model 


Schmidt  and  Strauss  [1975]  determine  the  MLE  of  the  parameters  of 


a simultaneous  version  of  a multivariate  logit  model. 
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is  a Kxl  vector;  Q is  a Lai  vector.  Again,  let  1(E)  represent 
the  indicator  function  of  the  event  E.  Let 
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In  order  to  compactly  express  the  second  parti  sis,  the  following  nota- 
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derivatives  can  be  expressed  as  follov.’s 
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exogenous  variables  u ■ • C ^ 

of  terms  5.10  to  5.15  uniformly  in  compact  oats  of  the  unknown 
parameters,  is  sufficient  to  assure  that  the  MLf:  ?ro  consistent, 
rsyinptoiically  normally  distributed,  and  efficient  in  the  HP  cense. 
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